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Abstract 

We describe a generic framework for representing and reasoning with annotated Semantic Web data, a task becoming 
more important with the recent increased amount of inconsistent and non-reliable meta-data on the web. We formalise 
the annotated language, the corresponding deductive system and address the query answering problem. Previous 
contributions on specific RDF annotation domains are encompassed by our unified reasoning formalism as we show 
by instantiating it on (i) temporal, (ii) fuzzy, and (iii) provenance annotations. Moreover, we provide a generic method 
for combining multiple annotation domains allowing to represent, e.g., temporally-annotated fuzzy RDF. Furthermore, 
we address the development of a query language - AnQL - that is inspired by SPARQL, including several features 
of SPARQL 1.1 (subqueries, aggregates, assignment, solution modifiers) along with the formal definitions of their 
semantics. 
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1. Introduction 

RDF (Resource Description Framework) JT) is the 
widely used representation language for the Semantic 
Web and the Web of Data. RDF exposes data as triples, 
consisting of subject, predicate and object, stating that 
subject is related to object by the predicate relation. 
Several extensions of RDF were proposed in order to 
deal with time J2J |3] |4], truth or imprecise informa- 
tion |5]|6), trust QUO and provenance 0. All these 
proposals share a common approach of extending the 
RDF language by attaching meta-information about the 
RDF graph or triples. RDF Schema (RDFS) [ 10 1 is the 
specification of a restricted vocabulary that allows one 
to deduce further information from existing RDF triples. 
SPARQL [ 1 1 1 is the W3C-standardised query language 
for RDF. 

In this paper, we present an extension of the RDF 
model to support meta-information in the form of an- 
notations of triples. We specify the semantics by con- 
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servatively extending the RDFS semantics and provide a 
deductive system for Annotated RDFS. Further, we de- 
fine a query language that extends SPARQL and include 
advanced features such as aggregates, nested queries 
and variable assignments, which are part of the not-yet- 
standardised SPARQL 1.1 specification. The present 
paper is based on and extends two previously published 
articles introducing Annotated RDFS [12] and AnQL 
(our SPARQL extension) (T3). In addition to improv- 
ing the descriptions of this existing body of work, we 
provide the following novelties: 

1 . we introduce a use case scenario that better reflects 
a realistic example of how annotations can be used; 

2. we detail three concrete domains of annotations 
(temporal, fuzzy, provenance) that were only 
sketched in our previous publications; 

3. we present a detailed and systematic approach for 
combining multiple annotation domains into a new 
single complex domain; this represents the most 
significant novel contribution of the paper; 

4. we discuss the integration of annotated triples with 
standard, non-annotated triples, as well as the inte- 
gration of data using different annotation domains; 
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5. we describe a prototype implementation. 

Section[2]gives preliminary definitions of the RDFS se- 
mantics and query answering, restricting ourselves to 
the sublanguage pdf. Our extension of RDF is presented 
in Section [3] together with essential examples of primi- 
tive domains. Our extension of SPARQL, is presented 
in Section|4] Furthermore, Section[5jpresents a discus- 
sion of important issues with respect to specific domains 
and their combination. Finally, Section [6] describes our 
prototype implementation. 

Related Work 

The basis for Annotated RDF were first established 
by Udrea et al. Ifl4l IT311 . where they define triples an- 
notated with values taken from a. finite partial order. In 
their work, triples are of the form (s, p : A, o), where 
the property, rather than the triple is annotated. We in- 
stead rely on a richer, not necessarily finite, structure 
and provide additional inference capabilities to 03J, 
such as a more involved propagation of annotation val- 
ues through schema triples. For instance, in the tempo- 
ral domain, from (a, sc, b) : [2, 6] and (b, SC, c) : [3, 8], 
we will infer (a, sc, c): [3,6] (sc is the subclass prop- 
erty). Essentially, Udrea et al. do not provide an opera- 
tion to combine the annotation in such inferences, while 
the algebraic structures we consider support such op- 
erations. Also, they require specific algorithms, while 
we show that a simple extension to the classical RDF 
inference rules is sufficient. The query language pre- 
sented in this paper consists of conjunctive queries and, 
while SPARQL's Basic Graph Patterns are compared to 
their conjunctive queries, they do not consider extend- 
ing SPARQL with the possibility of querying annota- 
tions. Furthermore, OPTIONAL, UNION and FILTER 
SPARQL queries are not considered which results in a 
subset of SPARQL that can be directly translated into 
their previously presented conjunctive query system. 

Adding annotations to logical statements was al- 
ready proposed in the logic programming realm in 
which Kifer & Subrahmanian [ 16 1 present a similar ap- 
proach, where atomic formulas are annotated with a 
value taken from a lattice of annotation values, an an- 
notation variable or a complex annotation, i.e., a func- 
tion applied to annotation values or variables. Sim- 
ilarly, we can relate our work to annotated relational 
databases, especially Green et al. 11711 who provides a 
similar framework for the relational algebra. After pre- 
senting a generic structure for annotations, they focus 
more specifically on the provenance domain. The speci- 
ficities of the relation algebra, especially Closed World 



Assumption, allows them to define a slightly more gen- 
eral structure for annotation domains, namely semiring 
(as opposed to the residuated lattice in our initial ap- 
proach lfl2l [T3l ). In relation to our rule-based RDFS 
Reasoning, it should be mentioned that Green et al. ifTTl 
also provide an algorithm that can decide ground query 
answers for annotated Datalog, which might be used for 
RDFS rules; general query answering or materialisation 
though might not terminate, due to the general structure 
of annotations, in their case. Karvounarakis et al. Ifl8l 
extend the work of [17| towards various annotations - 
not only provenance, but also confidence, rank, etc. - 
but do not specifically discuss their combinations. 

For the Semantic Web, several extensions of RDF 
were proposed in order to deal with specific domains 
such as truth of imprecise information UJ [19] [20] [6), 
time E] [3] H, trust (7] M and provenance 0. These 
approaches are detailed in the following paragraphs. 

Straccia [6|, presents Fuzzy RDF in a general set- 
ting where triples are annotated with a degree of truth 
in [0, 1]. For instance, "Rome is a big city to degree 
0.8" can be represented with (Rome, type, BigCity) : 0.8; 
the annotation domain is [0, 1]. For the query lan- 
guage, it formalises conjunctive queries. Other similar 
approaches for Fuzzy RDF J5j [19] 20 1 provide the syn- 
tax and semantics, along with RDF and RDFS interpre- 
tations of the annotated triples. In |20| the author de- 
scribes an implementation strategy that relies on trans- 
lating the Fuzzy triples into plain RDF triples by using 
reification. However these works focus mostly on the 
representation format and the query answering problem 
is not addressed. 

Gutierrez et al. |2| presents the definitions of Tempo- 
ral RDF, including reduction of the semantics of Tem- 
poral RDF graphs to RDF graphs, a sound and com- 
plete inference system and shows that entailment of 
Temporal graphs does not yield extra complexity than 
RDF entailment. Our Annotated RDFS framework en- 
compasses this work by defining the temporal domain. 
They present conjunctive queries with built-in predi- 
cates as the query language for Temporal RDF, although 
they do not consider full SPARQL. Pugliese et al. J3] 
presents an optimised indexing schema for Temporal 
RDF, the notion of normalised Temporal RDF graph and 
a query language for these graphs based on SPARQL. 
The indexing scheme consists of clustering the RDF 
data based on their temporal distance, for which sev- 
eral metrics are given. For the query language they 
only define conjunctive queries, thus ignoring some of 
the more advanced features of SPARQL. Tappolet and 
Bernstein [4 J present another approach to the imple- 
mentation of Temporal RDF, where each temporal in- 
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terval is represented as a named graph [21] containing 
all triples valid in that time period. Information about 
temporal intervals, such as their relative relations, start 
and end points, is asserted in the default graph. The t- 
SPARQL query language allows to query the temporal 
RDF representation using an extended SPARQL syn- 
tax that can match the graph pattern against the snap- 
shot of a temporal graph at any given time point and 
allows to query the start and endpoints of a temporal in- 
terval, whose values can then be used in other parts of 
the query. 

SPARQL extensions towards querying trust have 
been presented by Hartig [7j. Hartig introduces a trust 
aware query language, tSPARQL, that includes a new 
constructor to access the trust value of a graph pattern. 
This value can then be used in other statements such 
as FILTERs or ORDER. Also in the setting of trust 
management, Schenk |8| defines a bilattice structure to 
model trust relying on the dimensions of knowledge and 
truth. The defined knowledge about trust in informa- 
tion sources can then be used to compute the trust of an 
inferred statement. An extension towards OWL is pre- 
sented but there is no query language defined. Finally, 
this approach is used to resolve inconsistencies in on- 
tologies arising from connecting multiple data sources. 

In (9j the authors also present a generic extension of 
RDF to represent meta information, mostly focused on 
provenance and uncertainty. Such meta information is 
stored using named graphs and their extended semantics 
of RDF, denoted RDF + , assumes a predefined vocabu- 
lary to be interpreted as the meta information. However 
they do not provide an extension of the RDFS inference 
rules or any operations for combining meta information. 
The authors also provide an extension of the SPARQL 
query language, considering an additional expression 
that enables querying the RDF meta information. 

Our initial approach of using residuated lattices as the 
structure for representing annotations lfP2l [T3l was ex- 
tended to the more general semiring structure by Bune- 
man & Kostylev 11221 . This paper also shows that, once 
the RDFS inferences of an RDF graph have been com- 
puted for a specific domain, it is possible to reuse these 
inferences if the graph is annotated with a different do- 
main. Based on this result the authors define a universal 
domain which is possible to transform to other domains 
by applying the corresponding transformations. 

Aidan Hogan's thesis 11231 Chapter 6] provides a 
framework for a specific combination of annotations 
(authoritativeness, rank, blacklisting) within RDFS and 
(a variant of) OWL 2 RL. This work is orthogonal to 
ours, in that it does not focus on aspects of query an- 
swering, or providing a generic framework for combina- 



tions of annotations, but rather on scalable and efficient 
algorithms for materialising inferences for the specific 
combined annotations under consideration. 

2. Preliminaries - Classical RDF and RDFS 

In this section we present notions and definitions that 
are necessary for our discussions later. First we give a 
short overview of RDF and RDFS. 

2.1. Syntax 

Consider pairwise disjoint alphabets U, B, and L de- 
noting, respectively, URI references, blank nodes and 
//fera/s[[] We call the elements in UBL (B) terms {vari- 
ables, denoted x,y,z). An RDF triple is t = (s, p,o) e 
UBL x U x UBL^ We call s the subject, p the pred- 
icate, and o the object. A graph G is a set of triples, 
the universe of G, universe(G), is the set of elements in 
UBL that occur in the triples of G, the vocabulary of G, 
voc(G), is universe(G) n UL. 

We rely on a fragment of RDFS, called pdf [24 1, 
that covers essential features of RDFS. pdf is de- 
fined as the following subset of the RDFS vocabu- 
lary: pdf = {sp, sc, type, dom, range). Informally, (0 
(p,sp,q) means that property p is a subproperty of 
property q; (ii) (c, sc, d) means that class c is a subclass 
of class d; (Hi) (a, type, b) means that a is of type b; (z'v) 
(p,dom,c) means that the domain of property p is c; 
and (v) (p, range, c) means that the range of property 
p is c. In what follows we define a map as a function 
p : UBL — > UBL preserving URIs and literals, i.e., 
p(t) = t, for all t e UL. Given a graph G, we define 
p(G) = \(p(s),p(p),p(o)) | (s,p,o) e G). We speak of 
a map p from G\ to G2, and write p. : G\ — > G2, if p is 
such that p{G\) c G2. 

2.2. Semantics 

An interpretation I over a vocabulary V is a tu- 
ple I = <A S ,A P ,A C ,A L , />[[•]], C[H],- J >, where A S ,A F , 
Ac, A/, are the interpretation domains of I, which are 
finite non-empty sets, and PfcJ, Cfl/2, - J are the inter- 
pretation functions of I. They have to satisfy: 

1 . A R are the resources (the domain or universe of J); 

2. Ap are property names (not necessarily disjoint from 

A R ); 

3. A c Q A R are the classes; 



'We assume U, B, and L fixed, and for ease we will denote unions 
of these sets simply concatenating their names. 
2 As in [24] we allow literals for s. 
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4. A L c A R are the literal values and contains L n V; 

5. P[-] is a function ff-J : A P -> 2 A * xA «; 

6. C|[-l is a function CH : A c -» 2 A "; 

7. - 7 maps each t 6 UL n V into a value t 1 € A R U A P , and 
such that J is the identity for plain literals and assigns 
an element in A p to each element in L; 

An interpretation J is a model of a ground graph G, de- 
noted J |= G, if and only if I is an interpretation over 
the vocabulary pdf U universe(G) that satisfies the fol- 
lowing conditions: 

Simple: 

1. for each (s,p,o) e G, p 1 * e A P and (s-^o 7 -*) 6 Plp 1 ^; 
Subproperty: 

1. PIsp 7 '"]] is transitive over A P ; 

2. if (p, q) e PIsp^I then p,qeA P and Pip] C Plqf, 
Subclass: 

1. PIsc 7 '"! is transitive over A c ; 

2. if (c, d) e PIsc 7 "] then c, d e A c and C[c] c CM; 

Typing I: 

1. x e CM if and only if (x, c) e PHtype^]]; 

2. if (p, c) e PEdom 7 -"] and (x,y) e then x e C[c]; 

3. if O, c) € P[range J "] and (x,.y) e P[[pJ then y e C[c]; 

Typing II: 

1. For each e 6 pdf, e 7 ' 3 eA P 

2. if O, c) € Pflclom 7 *]] then p e A P and c e A c 

3. if (p, c) e />[ range 7 " J then p e A P and c e A c 

4. if (jc, c) e Pltype 7 "] then c e A c 

Entailment among ground graphs G and H is as usual. 
Now, G \= H, where G and H may contain blank nodes, 
if and only if for any grounding G' of G there is a 
grounding H' of H such that G' |= //'j^J 

Remark 2.1. /« 4241/ . the authors define two variants 
of the semantics: the default one includes reflexivity 
of Plsp 1 ^ (resp. Clsc J l) over A P (resp. A c ) but 
we are only considering the alternative semantics pre- 
sented in M4\ Definition 4] which omits this require- 
ment. Thus, we do not support an inference such as 
G \= (a, SC, a), which anyway are of marginal interest. 



Remark 2.2. In a First-Order Logic (FOL) setting, we 
may interpret classes as unary predicates, and (RDF) 
predicates as binary predicates. Then 

1. a subclass relation between class c and d may be 
encoded as the formula Vx.c(x) => d(x) 

2. a subproperty relation between property p and q 
may be encoded as VxVy.p(x,y) => q(x,y) 

3. domain and range properties may be represented 
as: VxVy.p(x, y) => c(x) and VxVy./?(x, y) => c(y) 

4. the transitivity of a property can be represented as 
VxVy3z.(p(x,z) A p(z,y)) => p(x,y) 

Although this remark is trivial, we will see that it will 
play an important role in the formalisation of annotated 
RDFS. 

2.3. Deductive system 

In what follows, we provide the sound and complete 
deductive system for our language derived from [24|. 
The system is arranged in groups of rules that cap- 
tures the semantic conditions of models. In every rule, 
A,B,C,X, and Y are meta-variables representing ele- 
ments in UBL and D, E represent elements in UL. The 
rules are as follows: 

1. Simple: 

(a) § for a map /j : C — > G (b) § for G' Q G 

2. Subproperty: 

, (A,sp,fl).(fi,sp,C) (P,sp,E),(X.P,F) 

K ' (A.sp.C) ( ' (X,E,Y) 

3. Subclass: 



(A,sc,fi).(Xtype./t) 
(X.type.fl) 



3 A grounding G' of graph G is obtained, as usual, by replacing 
variables in G with terms in UL. 



4. Typing: 

, , (P.dom.B).(X.D.r) ,,s (P.range.B).(X.D.r) 
(a > (X.type.fl) > (K.type.S) 

5. Implicit Typing: 

f x (A.clom.g),(P.sp.A).(X.D.l , ) \ (A.range.fi).(P.spA).(X.D.r) 

^ (X.type.B) *■ > (Wype.B) 

A reader familiar with l24l will notice that these rules 
are as rules 1-5 of [24] (which has 7 rules). We ex- 
cluded the rules handling reflexivity (rules 6-7) which 
are not needed to answer queries. Furthermore, as noted 
in |24l . the "Implicit Typing" rules are a necessary ad- 
dition to the rules presented in [25 1 for complete RDFS 
entailment. These represent the case when variable A in 
(D, sp, A) and (A, dom, B) or (A, range, B), is a property 
implicitly represented by a blank node. 

We denote with [t\, . . . , r„) i-rdfs t that the conse- 
quence r is obtained from the premise T\, . . . ,t„ by ap- 
plying one of the inference rules 2-5 above. Note that 
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n € {2, 3}. I-rdfs is extended to the set of all RDFS rules 
as well, in which case n £ {1,2, 3). 

If a graph G' can be obtained by recursively applying 
rules 1-5 from a graph G, the sequence of applied rules 
is called a proof, denoted G h G', of G' from G. The 
following proposition shows that our proof mechanism 
is sound and complete w.r.t. the pdf semantics: 

Proposition 2.1 (Soundness and completeness |24|). 

Inference I- based on rules 1-5 as of [24 ] and applied to 
our semantics defined above is sound and complete for 
\=, that is, G v G' if and only ifG\= G' . 

Proposition 2.2 (|24|). Assume G h G' then there is a 
proof of G' from G where the rule (la) is used at most 
once and at the end. 

Finally, the closure of a graph G is defined as cl(G) = 
{r | G h* t), where h* is as h except that rule (la) is 
excluded. Note that the size of the closure of G is poly- 
nomial in the size of G and that the closure is unique. 
Now we can prove that: 

Proposition 2.3. G h G' if and only ifG' Q cl(G) or G' 
is obtained from cl(G) by applying rule (la). 

2.4. Query Answering 

Concerning query answering, we are inspired by [26 1 
and the Logic Programming setting and we assume that 
a RDF graph G is ground, that is blank nodes have been 
skolemised, i.e., replaced with terms in UL. 

A query is of the rule-like form 

q(x) <- 3y.^(x,y) 

where q(x) is the head and 3y.tp(x, y) is the body of 
the query, which is a conjunction (we use the symbol 
"," to denote conjunction in the rule body) of triples t, 
(1 ^ i ^ re), x is a vector of variables occurring in 
the body, called the distinguished variables, y are so- 
called non-distinguished variables and are distinct from 
the variables in x, each variable occurring in t, is either 
a distinguished or a non-distinguished variable. If clear 
from the context, we may omit the existential quantifi- 
cation By. 

In a query, we allow built-in triples of the form 
(s, p, o), where p is a built-in predicate taken from a re- 
served vocabulary and having a fixed interpretation. We 
generalise the built-ins to any n-ary predicate p, where 
p's arguments may be pdf variables, values from UL, 
and p has a fixed interpretation. We will assume that the 
evaluation of the predicate can be decided in finite time. 



For convenience, we write "functional predicates'jj as 
assignments of the form x:=/(z) and assume that the 
function /(z) is safe. We also assume that a non func- 
tional built-in predicate p(z) should be safe as well. 
A query example is: 

q(x,y) <— (y, created, x), (y, type, Italian), 
(x, exhibitedAt, Uffizi) 

having intended meaning to retrieve all the artefacts x 
created by Italian artists y, being exhibited at Uffizi 
Gallery. 

In order to define an answer to a query we introduce 
the following: 

Definition 2.1 (Query instantiation). Given a vector 
x = (xi,... ,Xk) of variables, a substitution over x is a 
vector of terms t replacing variables in x with terms of 
UBL. Then, given a query q(x) <— 3y. ip(x, y), and two 
substitutions t, f' over x and y, respectively, the query 
instantiation ip(t, t') is derived from <p(x, y) by replacing 
x and y with t and t', respectively. 

Note that a query instantiation is an RDF graph. 

Definition 2.2 (Entailment). Given a graph G, a query 
q(x) <— 3y.<p(x, y), and a vector t of terms in 
universe(G), we say that q(t) is entailed by G, denoted 
G |= q(t), if and only if in any model I of G, there is a 
vector t' of terms in universe(G) such that I is a model 
of the query instantiation <p(t, t')- 

Definition 2.3. IfG \= q{i) then t is called an answer to 
q. The answer set of q w.r.t. G is defined as ans(G,q) = 
{t|Gh<?(t)}. 

We next show how to compute the answer set. The fol- 
lowing can be shown: 

Proposition 2.4. Given a graph G, t is an answer to q 
if and only if there exists an instantiation ^>(t,t') that is 
true in the closure of G (i.e., all triples in ip(t, t') are in 
cl(G)). 

Therefore, we have a simple method to determine 
ans(G,q). Compute the closure c/(G) of G and store 
it into a database, e.g., using the method [27 1. It is eas- 
ily verified that any query can be mapped into an SQL 
query over the underlying database schema. Hence, 
ans(G,q) can be determined by issuing such an SQL 
query to the database. 



4 A predicate p(x, v) is functional if for any t there is unique f for 
which p(t, t') is true. 
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(youtubeEmp, sc, googleEmp) : [2006, 201 1] 
(steveChen, type, youtubeEmp) : [2005, 201 1] 
(chadHurley, type, youtubeEmp) : [2005, 2010] 
(jawedKarim, type, youtubeEmp) : [2005, 201 1] 
( j awedKar im, type, paypalEmp) : [2000, 2005] 
(paypalEmp, sc, ebayEmp) : [2002, 201 1] 
(chadHurley, type, paypalEmp) : [2002, 2005] 
(skypeEmp, sc, ebayEmp) : [2005, 201 1] 
(SkypeCollab, sc, EbayCollab) : [2005, 2009] 
(SkypeCollab, sc, EbayCollab) : [2009, 201 1] 
(niklasZennstrom, ceo, skype) : [2003, 2007] 
(ceo, sp, worksFor): [-co, +oo] 
(larryPage, worksFor, google) : [1998, 201 1] 
(sergeyBrin, worksFor, google) : [1998, 201 1] 



Figure 1 : Company acquisition dataset example 



3. RDFS with Annotations 

This section presents the extension to RDF towards 
generic annotations. Throughout this paper we will use 
an RDF dataset describing companies, acquisitions be- 
tween companies and employment history. This dataset 
is partially presented in Figure [T] We consider this data 
to be annotated with the temporal domain, which intu- 
itively means that the annotated triple is valid in dates 
contained in the annotation interval (the exact mean- 
ing of the annotations will be explained later). Also, 
the information in this example can be derived from 
Wikipedia and thus we can consider this data also anno- 
tated with the provenance domain (although not explic- 
itly represented in the example). We follow the mod- 
elling of employment records proposed by DBpedia, 
for instance a list of employees of Google is available 
as members of the class |http : / / dbpedia . org/] 
|class/yago/GoogleEmployees| For presenta- 
tion purposes we use the shorter name googleEmp. We 
also introduce SkypeCollab (resp. EbayCollab) to rep- 
resent Skype's (resp. Ebay's) collaborators. 

3.1. Syntax 

Our approach is to extend triples with annotations, 
where an annotation is taken from a specific domain]^] 

An annotated triple is an expression t: A, where t is 
a triple and A is an annotation value (defined below). 
An annotated graph is a finite set of annotated triples. 
The intended semantics of annotated triples depends of 



course on the meaning we associate to the annotation 
values. For instance, in a temporal setting [2|, 

(niklasZennstrom, ceoOf , skype) : [2003, 2007] 

has intended meaning "Niklas was CEO of Skype dur- 
ing the period 2003 to 2007", while in the fuzzy set- 
ting (6| (skype, ownedBy, bigCompany) : 0.3 has intended 
meaning "Skype is owned by a big company to a degree 
not less than 0.3". 

3.2. RDFS Annotation Domains 

To start with, let us consider a non-empty set L. El- 
ements in L are our annotation values. For example, in 
a fuzzy setting, L = [0,1], while in a typical tempo- 
ral setting, L may be time points or time intervals. In 
our annotation framework, an interpretation will map 
statements to elements of the annotation domain. Our 



semantics generalises the formulae in Remark 2.2 by 
using a well known algebraic structure. 

We say that an annotation domain for RDFS is an 
idempotent, commutative semi-ring 



D = (L, 



,±,T> 



5 The readers familiar with the annotated logic programming 
framework 1 16], will notice the similarity of the approaches. 



where © is T-annihilating [22]. That is, for A, A/ e L 

1. © is idempotent, commutative, associative; 

2. ® is commutative and associative; 

3. J. © A = A, T ® A = A, X ® A = -L, and T © A = T; 

4. ® is distributive over ffi, i.e., A\ ® (A 2 © A 3 ) = (A\ ® 
A 2 )®(Ai ®^ 3 ); 

It is well-known that there is a natural partial order on 
any idempotent semi-ring: an annotation domain D 
(I . ©, ®, _l, t) induces a partial order < over L defined 
as: 

A\ < A 2 if and only if A\ ffi A2 = A2 . 

The order < is used to express redun- 
dant/entailed/subsumed information. For in- 
stance, for temporal intervals, an annotated triple 
(s,p,o): [2000,2006] entails (s,p,o): [2003, 2004], as 
[2003, 2004] c [2000, 2006] (here, C plays the role of 
<)■ 

Remark 3.1. In previous work M2\ 1731/ . an annota- 
tion domain was assumed to be a more specific struc- 
ture, namely a residuated bounded lattice D — (L, < 
, A, V, ®, =>, ±, T>. That is, 
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1. (L, <,A,V,±,T) is a bounded lattice, where ± and 
T are bottom and top elements, and A and V are 
meet and join operators; 

2. (L, ®, T) is a commutative monoid. 

3. => is the so-called residuum of ®, i.e., for all 
/ii,/l2,/l3, A\ ® A3 < A2 if and only if A3 < (A\ => 
A 2 ). 

Note that any bounded residuated lattice satisfies the 
conditions of an annotation domain. In it was 
shown that we may use a slightly weaker structure than 
residuated lattices for annotation domains. 

Remark 3.2. Observe that (L, <, ®, ±, T) is a bounded 
join semi-lattice. 

Remark 3.3. Note that the domain Dq\ — 
({0, 1), max, min, 0, 1) corresponds to the boolean 
case. In fact, in this case annotated RDF S will turn out 
to be the same as classical RDFS. 

Remark 3.4. We use ® to combine information about 
the same statement. For instance, in temporal logic, 
from t: [2000,2006] and r: [2003,2008], we infer 
t: [2000,2008], as [2000,2008] = [2000, 2006] U 
[2003,2008]; here, U plays the role of®. In the fuzzy 
context, from r: 0.7 and t: 0.6, we infer r: 0.7, as 
0.7 = max(0.7, 0.6) (here, max plays the role of®). 

Remark 3.5. We use ® to model the "conjunction " of 
information. In fact, a® is a generalisation of boolean 
conjunction to the many-valued case. In fact, ® satisfies 
also that 

1. ® is bounded: i.e., A\ ® A2 < A\. 

2. ® is <-monotone, i.e., for A\ < A% A ® A\ < A® A2 

For instance, on interval-valued temporal logic, from 
(a,sc,b): [2000,2006] and(b,sc,c): [2003,2008], we 
will infer (a,sc,c): [2003,2006], as [2003,2006] = 
[2000,2006] n [2003,2008]; here, n plays the role of 
®^In the fuzzy context, one may chose any t-norm H28\ 
\29V , e.g., product, and, thus, from (a, SC, b) : 0.7 and 
(b, SC, c) : 0.6, we will infer (a, sc, c) : 0.42, as 0.42 
0.7 • 0.6) (here, ■ plays the role of®). 

Remark 3.6. Observe that the distributivity condition 
is used to guarantee that e.g., we obtain the same anno- 
tation A®(A 2 ® A 3 ) - (/}[ ® A 2 ) ® (Ai ® A 3 ) of the triple 
(a, SC, c) that can be inferred from triples (a, SC, b) : A\, 
(b, SC, c) : A 2 and (b, sc, c) : A3. 



'As we will see, ffi and 8> may be more involved. 



Finally, note that, conceptually, in order to build an an- 
notation domain, one has to: 

1. determine the set of annotation values L (typically 
a countable seQ, identify the top and bottom ele- 
ments; 

2. define a suitable operations ® and ® that acts as 
"conjunction" and "disjunction" function, to sup- 
port the intended inference over schema axioms, 
such as 

"from (a,sc, b): A and (b, SC, c): A' in- 
fer (a, sc, c): A® A'" 

and 

"from t: A and r: A' infer r: A® A"' 

3.3. Semantics 

Fix an annotation domain D - (L, ®, ®, _L,T). In- 
formally, an interpretation I will assign to a triple r an 
element of the annotation domain A e L. Formally, an 
annotated interpretation I over a vocabulary V is a tu- 
ple 

J = <A S ,A P ,A C ,A L ,PI-I,CI-I,- J ) 

where A«, Ap, Ac, A/, are interpretation domains of I 
and Pfl/]], CQ/I, - J are interpretation functions of I . 
They have to satisfy: 

1. As is a nonempty finite set of resources, called the 
domain or universe of I; 

2. Ap is a finite set of property names (not necessarily 
disjoint from A fi ); 

3. Ac £ As is a distinguished subset of A« identifying 
if a resource denotes a class of resources; 

4. Al c As, the set of literal values, A/, contains all 
plain literals in L fi V; 

5. Pfl/J maps each property name p e Ap into a func- 
tion P^pJ : As x As — > L, i.e., assigns an annota- 
tion value to each pair of resources; 

6. CD/]] maps each class c e Ac into a function 
C[[c]] : As — > L, i.e., assigns an annotation value 
representing class membership in c to every re- 
source; 



7 Note that one may use XML decimals in [0, 1] in place of real 
numbers for the fuzzy domain. 
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7. - J maps each t e UL n V into a value eA s UA P 
and such that • 1 is the identity for plain literals and 
assigns an element in A« to each element in L. 

An interpretation J is a model of an annotated ground 
graph G, denoted 1 1= G, if and only if J is an interpre- 
tation over the vocabulary pdf U universe{G) that satis- 
fies the following conditions: 

Simple: 

1. (s, p,o): Ae G implies p 1 e A P and Plp^s 1 , o 1 ) > A; 
Subproperty: 

1. PIsp J ](p, q) ® Plsp J Mq, r) < P[[sp J J](p, r); 

2. Plp T ](x, y) 8 P[sp J ]l(p, q) < Plq T i(x, y); 

Subclass: 

1 . / > [sc J ](c, d) 8 Plsc T W, e) < P|[sc J ](c, e); 

2. C[c J ](x) 8 P[sc J I(c, d) < PldtJix); 

Typing I: 

1. C|[c]|(Jc) = PItype J l(x,c); 

2. PIdom J 3(p,c) s PD>l(x,y) < C[#); 

3. />[[range J ]](p, c) 8 PI P ](j,j) < C[c](y); 

Typing II: 

1. For each e 6 pdf, e J e A f ; 

2. />[[sp J ]](/;>, q) is defined only for p, q 6 A/>; 

3. C[[sc J ]](c,<f) is defined only for c, d e A c ; 

4. P[[dom J ]](p, c) is defined only for p e A P and c 6 A c ; 

5. P[[range J ]|(p, c) is defined only for p e A P and c e A c ; 

6. P[type J ]('5, c) is defined only for c e A c . 

Intuitively, a triple (s,p, d): A is satisfied by I if (s, o) 
belongs to the extension of p to a "wider" extent than A. 
Note that the major differences from the classical setting 
relies on items [5] and [6] 

We further note that the classical setting is as the case 
in which the annotation domain is Dq\ where L - {0, 1). 

Finally, entailment among annotated ground graphs 
G and H is as usual. Now, G\= H, where G and H may 
contain blank nodes, if and only if for any grounding G' 
of G there is a grounding H' of H such that G' \= H' . 

Remark 3.7. Note that we always have that G \= t: ±. 
Clearly, triples of the form r: ± are uninteresting and, 
thus, in the following we do not consider them as part 
of the language. 

As for the crisp case, it can be shown that: 



Proposition 3.1. Any annotated RDFS graph has a fi- 
nite model. 

Proof 3.1. Let G be an annotated graph over domain 
D. Let Lit — LC\universe(G) be the set of literals present 
in G and Iq e Lit. We define the interpretation I over V 
as follows: 

1. A R = A P = A c = Lit = A L = Lit; 

2. Vx,y,pP|[p](x,y) ^ T; 

3. Vx,cC[c](x) i — > T; 

4. (a) V/ € L,Z J = / 
(b) Vie V,f = l 

It is easy to see that I satisfies all the conditions of 
RDF- satisfiability and thus is a model of G. 

Therefore, we do not have to care about consistency. 

3.4. Examples of primitive domains 

To demonstrate the power of our approach, we illustrate 
its application to some domains: fuzzy [6], temporal [2| 
and provenance. 

3.4.1. The fuzzy domain 

To model fuzzy RDFS (6) we may define the annotation 
domain as Z)[o,i] = ([0, l],max, ®, 0, 1) where ® is any 
continuous t-norm on [0, 1]. 

Example 3.1. Adapting our example of employment 
records to the fuzzy domain we can state the follow- 
ing: Skype collaborators are also Ebay collaborators 
to some degree since Ebay possesses 30% of Skype 's 
shares, and also that Toivo is a part-time Skype collab- 
orator: 

(SkypeCollab,sc, EbayCollab): 0.3 
(toivo, type, SkypeCollab): 0.5 

Then, e.g., under the product t-norm ®, we can infer the 
following triple: 

(toivo, type, EbayCollab): 0.15 



3.4.2. The temporal domain 

Most of the semantic information on the Web deals 
with time in an implicit or explicit way. Social relation 
graphs, personal profiles, information about various en- 
tities continuously evolve and do not remain static. This 
dynamism can take various forms: certain information 
is only valid in a specific time interval {e.g., somebody's 
address), some data talks about events that took place at 
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a specific time point in the past (e.g., beginning of a con- 
ference), some data describe eternal truth (e.g., tigers 
are mammals), or truth that is valid from a certain point 
of time onwards forever (e.g Elvis is dead), or creation 
or change dates of online information items (e.g., the 
edit history of a wiki page). We believe that treating 
web data in a time-sensitive way is one of the biggest 
steps towards turning the Semantic Web idea into real- 
ity. 

Precise temporal information. For our representation 
of the temporal domain we aim at using non-discrete 
time as it is necessary to model temporal intervals with 
any precision, however, for presentation purposes we 
will show the dates as years only. 

Modelling the temporal domain. To start with, time 
points are elements of the value space Q U {— oo, +00}. 
A temporal interval is a non-empty interval [ari,ar2]> 
where or, are time points. An empty interval is de- 
noted as 0. We define a partial order on intervals as 

1 1 ^ I<x if and only if l\ c I 2 . The intuition here is that 
if a triple is true at time points in I 2 and 7) ^ I 2 then, in 
particular, it is true at any time point in l\ + 0. 

Now, apparently the set of intervals would be a can- 
didate for L, which however is not the case. The reason 
is that, e.g., in order to represent the upper bound inter- 
val of t. [1,5] andr: [8,9] we rather need the union of 
intervals, denoted {[1,5], [8, 9]}, meaning that a triple is 
true both in the former as well as in the latter interval. 
Now, we define L as (where ± = {0}, T = {[-00, +00]}) 

L = [t I t is a finite set of disjoint temporal intervals) Uj_L, T) . 

Therefore, a temporal term is an element t e L, i.e., 
a set of pairwise disjoint time intervals. We allow to 
write [a] as a shorthand for [a, a], t: a as a shorthand 
of t: {[a]} and t: [a, a 7 ] as a shorthand of t: {[a, a']}. 
Furthermore, on L we define the following partial order: 

t\ < t 2 if and only ifV/x e t\3I 2 e t 2 , such that 7) < I 2 . 

Please note that < is the Hoare order on power sets 1 30], 
which is a pre-order. For the anti-symmetry property, 
assume that t\ < t 2 and t 2 < t\\ so for l\ e 1 1 , there is 

1 2 e t 2 for which there is /3 e t\ such that I\ c / 2 c / 3 . 
But, t\ is maximal and, thus, I\ — h — h- So, t\ = t 2 
and, thus, < is a partial order. Similarly as for time 
intervals, the intuition for < is that if a triple is true 
at time points in intervals in t 2 and t\ < t 2 , then, in 
particular, it is true at any time point in intervals in 
t\. Essentially, if t\ < t 2 then a temporal triple t 2 : t 2 
is true to a larger "temporal extent" than the temporal 
triple T\ : t\. It can also be verified that (L, <, X, T) 
is a bounded lattice. Indeed, to what concerns us, the 



partial order < induces the following join (ffi) opera- 
tion on L. Intuitively, if a triple is true at fi and also 
true at t 2 then it will be true also for time points spec- 
ified by ?i © t 2 (a kind of union of time points). As an 
example, ifr: {[2,5], [8, 12]} and r: {[4,6], [9, 15]} are 
true then we expect that this is the same as saying that 
t: {[2, 6], [8, 15]} is true. The join operator will be de- 
fined in such way that {[2,5], [8, 12]} ©{[4, 6], [9, 15]} = 
{[2, 6], [8, 15]}. Operationally, this means that t\®t 2 will 
be obtained as follows: (z) take the union of the sets of 
intervals t = t\ U t 2 ; and (ii) join overlapping intervals 
in t until no more overlapping intervals can be obtained. 
Formally, 

fi e t 2 = inf{? I t > t u i = 1,2) . 

It remains to define the meet ® over 
sets of intervals. Intuitively, we would 

like to support inferences such as "from 
(a,sc,b): {[2, 5], [8, 12]} and (b,SC,c): {[4,6], [9, 15]} 
infer (a,sc,b): {[4,5], [9, 12]}", where {[2,5], [8, 12]}® 
{[4, 6], [9, 15]} = {[4, 5], [9, 12]}. We get it by means of 

t { t 2 = supjf 1 1 < t h i = 1, 2} . 

Note that here the t-norm used for modelling "conjunc- 
tion" coincides with the lattice meet operator. 

Example 3.2. Using the data from our running exam- 
ple, we can infer that 

(chadHurley, type, googleEmp) : [2006, 2010] 

where 

{[2005, 2010]}® {[2006, 2011]} = {[2006,2010]} 

In J21 are described some further features such as a 
"Now" time point (which is just a defined time point in 
Dj) and anonymous time points, allowing to state that 
a triple is true at some point. Adding anonymous time 
points would require us to extend the lattice by appro- 
priate operators, e.g., [4, T] © [T, 8] = [4, 8] (where T is 
an anonymous time point), etc. 

3.4.3. Provenance domain 

Identifying provenance of triples is regarded as an im- 
portant issue for dealing with the heterogeneity of Web 
Data, and several proposals have been made to model 
provenance iTJTl l32l l33l l34l . Typically, provenance is 
identified by a URI, usually the URI of the document in 
which the triples are defined or possibly a URI identi- 
fying a name graph. However, provenance of inferred 
triples is an issue that have been little tackled in the lit- 
erature IT351 [331 . We propose to address this issue by 
introducing an annotation domain for provenance. 
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The intuition behind our approach is similar to 
the one of 11351 and [33| where provenance of 
an inferred triple is defined as the aggregation of 
provenances of documents that allow to infer that 
triple. For instance, if a document d\ defines 
(youtubeEmp, SC, googleEmp):<f i and a second document 
d.2 defines (chadHurley, type, youtubeEmp):^, then we 
can infer (chadHurley,type, googleEmp):^ A d^. 

Such a mechanism makes sense and would fit well as 
a meet operator, but these approaches do not address the 
join operation which should take place when identical 
triples are annotated differently. We improve this with 
the following formalisation. 

Modelling the provenance domain. We start from a 
countably infinite set of atomic provenances P which, in 
practice, can be represented by URIs. We consider the 
propositional formulae made from symbols in P (atomic 
propositions), logical or (V) and logical and (A), for 
which we have the standard entailment |=. A provenance 
value is an equivalent class for the logical equivalence 
relation, i.e., the set of annotation values is the quotient 
set of P by the logical equivalence. The order relation 
is |=, ® and © are A and V respectively. We set T to true 
and ± to false. 

Example 3.3. Consider the following data: 

(chadHurley, worksFor, youtube) : chad 
(chadHurley, type, Person): chad 
(youtube, type, Company) : chad 
(Person, SC, Agent): foaf 
(worksFor, dom, Person): workont 
(worksFor, range, Company) : workont 

We can deduce that chadHurley is an Agent in two dif- 
ferent ways: using the first, fourth and fifth statement or 
using the second and fourth statement. So, it is possible 
to infer the following annotated triple: 

(chadHurley, type, Agent):( chadA foaf Aworkont) 

V(chadA foaf) 

However, since (chad A foaf A workont)\/ (chad A foaf) 
is logically equivalent to chad A foaf, the aggregated 
inference can be collapsed into: 

(chadHurley, type, Agent) : chad A foaf 

Intuitively, a URI denoting a provenance can also denote 
a RDF graph, either by using a named graph approach, 
or implicitly by getting a RDF document by dereferenc- 
ing the URI. In this case, we can see the conjunction 
operation as a union of graphs and disjunction as an in- 
tersection of graphs. 



Comparison with other approaches. 11351 does not for- 
malise the semantics and properties of his aggregation 
operation (simply denoted by A) nor the exact rules that 
should be applied to correctly and completely reason 
with provenance. Query answering is not tackled either. 

The authors of [33 1 are providing more insight on the 
formalisation and actually detail the rules by reusing 
(tacitly) E4l . They also provide a formalisation of a 
simple query language. However, the semantics they 
define is based on a strong restriction of pd^j 

As an example, they define the answers to the query 
(fx, type, ?y, ?c) as the tuples (X, Y, C) such that there is 
a triple (X, type, Y, C) which can be inferred from only 
the application of rules (3a) and (3b). This means that a 
domain or range assertion would not provide additional 
answers to that type of query. 

Finally, none of those papers discuss the possibility 
of universally true statements (the T provenance) or the 
statements from unknown provenance (±). They also 
do not consider mixing non-annotated triples with an- 
notated ones as we do in Section l531 

3.5. Deductive system 

An important feature of our framework is that we are 
able to provide a deductive system in the style of the one 
for classical RDFS. Moreover, the schemata of the rules 
are the same for any annotation domain (only support 
for the domain dependent ® and © operations has to be 
provided) and, thus, are amenable to an easy implemen- 
tation on top of existing systems. The rules are arranged 
in groups that capture the semantic conditions of mod- 
els, A, B, C, X and Y are meta-variables representing ele- 
ments in UBL and D, E represent elements in UL. The 
rule set contains two rules, (la) and (lb), that are the 
same as for the crisp case, while rules (2a) to (5b) are 
the annotated rules homologous to the crisp ones. Fi- 
nally, rule (6) is specific to the annotated case. 

Please note that rule (6) is destructive i.e., this rule 
removes the premises as the conclusion is inferred. We 
also assume that a rule is not applied if the consequence 
is of the form t: ± (see Remark |3~7] i. It can be shown 
that: 

1. Simple: 

(a) £ for a map fi : G' — * G 
(i) §, for C'CC 

2. Subproperty: 

f _\ (A,sp,.B): ii.(fl,sp,C): A 2 

w (A,sp,C): J, ®A 2 

m (D,sp,E):A t ,(X,D,Y):A 2 

w (X,E,Y); M ®Ai 



Remember that pdf is already a restriction of RDFS. 
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3. Subclass: 

(A,SC,B): Ai.(B,SC,C): A 2 
w (A,SC,C): A x ®A 2 

,,s (A,SC,B): ^.(X, type, A): A 2 
w (X,type,B): At ®A 2 

4. Typing: 

, , (Adorn, B): A,,(X,D,Y): A^ 
W (X.type.B): Ai ®A 2 

(D, range, B): A,,(X,D, Y): A 2 
w (YXype.B): Ai ®A 2 

5. Implicit Typing: 

<A,6om,B): A u (D,sp,A): Ai,(X,D, Y): i 3 
W (X,type,B): Ai ®A 2 ®A, 

... (A, range, B) : /l|.(D,sp,A): J;,(X,D, Y): A^ 
W (Y type, B) : A, ®A 2 ® Aj, 

6. Generalisation: 

(X,A,Y): A,.(X,A, Y): A^ 
(X,A, Y): A\ © A 2 

Proposition 3.2 (Soundness and completeness). For 

an annotated graph, the proof system V- is sound and 
complete for |=, that is, (1) if G \- t: A then G \= r: A 
and (2) if G |= t: A then there is A' > A with G h r: A'. 

We point out that rules 2-5 can be represented concisely 
using the following inference rule: 

(Ar\ iLLjj; ■■■■ T n ■ A„,{t\,... t„) Hrqfs t 
t: (gjAi 

Essentially, this rule says that if a classical RDFS triple 
t can be inferred by applying a classical RDFS infer- 
ence rule to triples t\,...t„ (denoted fri, . . . , r„} i-rdfs 
t), then the annotation term of r will be . A,-, where /I, 
is the annotation of triple t, . It follows immediately that, 
using rule (AG), in addition to rules (1) and (6) from the 
deductive system above, it is easy to extend these rules 
to cover complete RDFS. 

Finally, like for the classical case, the closure is de- 
fined as cl(G) = {t: A \ G h* t: A], where h* is as h 
without rule (la). Note again that the size of the clo- 
sure of G is polynomial in \G\ and can be computed in 
polynomial time, provided that the computational com- 
plexity of operations ® and © are polynomially bounded 
(from a computational complexity point of view, it is as 
for the classical case, plus the cost of the operations ® 
and © in L). Eventually, similar propositions as Propo- 
sitions 12.21 and 12.31 hold. 

Example 3.4. As an example, consider the following 
triples from Figure^ 

(youtubeEmp, sc, googleEmp) : [2006, 201 1] 
(chadHurley, worksFor, youtubeEmp) : [2005, 2010] 

we infer the following triple: 

(chadHurley, type, googleEmp) : [2006, 2010] 



3.6. Query Answering 

Informally, queries are as for the classical case where 
triples are replaced with annotated triples in which an- 
notation variables (taken from an appropriate alphabet 
and denoted A) may occur. We allow built-in triples of 
the form (s,p,o), where p is a built-in predicate taken 
from a reserved vocabulary and having a fixed interpre- 
tation on the annotation domain D, such as (A, <, I) stat- 
ing that the value of A has to be < than the value I € L. 
We generalise the built-ins to any «-ary predicate p, 
where p's arguments may be annotation variables, pdf 
variables, domain values of D, values from UL, and p 
has a fixed interpretation. We will assume that the eval- 
uation of the predicate can be decided in finite time. As 
for the crisp case, for convenience, we write "functional 
predicates" as assignements of the form x :=/(z) and as- 
sume that the function /(z) is safe. We also assume that 
a non functional built-in predicate p{z) should be safe 
as well. 

For instance, informally for a given time interval 
[fi,*2]> we ma y define x:=length([t l ,t2]) as true if and 
only if the value of x is t% — h . 

Example 3.5. Considering our dataset from Figure [7] 
as input and the query asking for people that work for 
Google between 2002 and 2011 and the temporal term 
at which this was true: 

q(x,A) <— (x, worksFor, google): A', 

A:=(A' A [2002,2011]) 

will get the following answers: 

(steveChen, [2006, 2011]) 
(chadHurley, [2006, 2010]) 
(jawedKarim, [2006, 201 1]) 
(larryPage, [2002, 2011]) 
(sergeyBrin, [2002, 201 1]). 

Formally, an annotated query is of the form 

q(x,\) <— 3y3A'.i/j(x, A, y, A') 

in which ip(x, A, y,A') is a conjunction (as for the 
crisp case, we use "," as conjunction symbol) of an- 
notated triples and built-in predicates, x and A are the 
distinguished variables, y and A' are the vectors of 
non-distinguished variables (existential quantified vari- 
ables), and x, A, y and A' are pairwise disjoint. Vari- 
ables in A and A' can only appear in annotations or 
built-in predicates. The query head contains at least one 
variable. 



11 



Given an annotated graph G, a query q(x,\) <— 
3y3\'.(p(x, A, y, A'), a vector t of terms in universe{G) 
and a vector A of annotated terms in L, we say that q(i, A) 
is entailed by G, denoted G \= q(i, A), if and only if 
in any model I of G, there is a vector t' of terms in 
universe{G) and a vector A' of annotation values in L 
such that I is a model of (p(t, A, t', A.'). If G |= q(t,A) 
then (t, 1) is called an answer to g. The answer set of g 
w.r.t. G is (< extends to vectors point-wise) 

ans(G, q) = {<t, I) | G |= g(t, I), A + I and 

for any A' + A such that G (= g(t, 1'), A' < A holds} . 

That is, for any tuple t, the vector of annotation values 
A is as large as possible. This is to avoid that redun- 
dant/subsumed answers occur in the answer set. The 
following can be shown: 

Proposition 3.3. Given a graph G, (t, A) is an answer 
to q if and only if 3y3A' .<p(i, A, y, A') is true in the clo- 
sure of G and A is <-maximal^ 

Therefore, we may devise a similar query answering 
method as for the crisp case by computing the closure, 
store it into a database and then using SQL queries with 
the appropriate support of built-in predicates and do- 
main operations. 

3. 7. Queries with aggregates 

As next, we extend the query language by allowing 
so-called aggregates to occur in a query. Essentially, 
aggregates may be like the usual SQL aggregate func- 
tions such as SUM, AVG, MAX, MIN. But, we have also 
domain specific aggregates such as © and ®. 

The following examples present some queries that 
can be expressed with the use of built-in queries and 
aggregates. 

Example 3.6. Using a built-in aggregate we can pose 
a query that, for each employee, retrieves his maximal 
time of employment for any company in the following 
way: 

q(x,maxL) <— (x, worksFor,y): A, 
tnaxL :—maxlength(A) 

Here, the maxlength built-in predicate returns, given a 
set of temporal intervals, the maximal interval in the set. 



'ByBA' .ip(i, A, y, A') is true in the closure of G if and only if for 
some t', A' for all triples in tp(i, A, i',A') there is a triple in cl(G) that 
subsumes it and the built-in predicates are true, where an annotated 
triple t: A[ subsumes t: Ai if and only if A2 < A\. 



Example 3.7. Suppose we are looking for employees 
that work for some companies for a certain time period. 
We would like to know the average length of their em- 
ployment. Then such a query will be expressed as 

q(x,avgL) <— (x, worksFor,y): A, 
GroupedBy(x), 
avgL:=AVG[length(A)] 

Essentially, we group by the employee, compute for each 
employee the time he worked for a company by means 
of the built-in function length, and compute the average 
value for each group. That is, g — {{t, fi), . . . , (f, t n )} is 
a group of tuples with the same value t for employee x, 
and value t t fory, where each length of employment for 
t[ is lj (computed as length(-)), then the value of avgL 
for the group g is h)ln. 

Formally, let @ be an aggregate function with @ e 
{SUM, AVG, MAX, MIN, COUNT,©,®} then a query 
with aggregates is of the form 

q(x,A,a) <— 3y3A'.^(x, A, y, A'), 
GroupedBy(w), 
a:=@[f(z)] 

where w are variables in x, y or A and each variable in 
x and A occurs in w and any variable in z occurs in y or 

A'. 

From a semantics point of view, we say that I is a 
model of (satisfies) q(i, A, a), denoted I |= g(t, A, a) if 
and only if 

a = @[ai,...,a k ] where g = {(t, A, t[, A[), . . . , <t, A, t[, A' k )}, 
is a group of k tuples with identical projection 
on the variables in w, </?(t, A, t'., A' r ) is true in I 
and a,. = /(t) where t is the projection of (t^., A' r ) 
on the variables z . 

Now, the notion of G |= g(t, A, a) is as usual: any model 
of G is a model of g(t, A, a). 

Eventually, we further allow to order answers accord- 
ing to some ordering functions. 

Example 3.8. Consider Example \3.7\ We additionally 
would like to order the employee according to the av- 
erage length of employment. Then such a query will be 
expressed as 

q(x,avgL) <— (x, worksFor,;y): A, 
GroupedBy(x), 
avgL:=KslG[length(A)}, 
OrderBy(avgL) 
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Formally, a query with ordering is of the form 

q(x,A,z) <- 3y3A'.y?(x, A,y, A'), OrderBy(z) 

or, in case grouping is allowed as well, it is of the form 

q(x, A, z, a) <— 3y3A' .ip(x, A, y, A'), 
GroupedBy(w), 

a:=@[fmi 
OrderBy(z) 

From a semantics point of view, the notion of G (= 
q(i, A, z, a) is as before, but the notion of answer set has 
to be enforced with the fact that the answers are now 
ordered according to the assignment to the variable z. 
Of course, we require that the set of values over which 
Z ranges can be ordered (like string, integers, reals). In 
case the variable z is an annotation variable, the order is 
induced by <. In case, < is a partial order then we may 
use some linearisation method for posets, such as [36]. 
Finally, note that the additional of the SQL-like state- 
ment LIMIT(fc) can be added straightforwardly. 



4. AnQL: Annotated SPARQL 

Our introduced query language so far allows for con- 
junctive queries. Languages like SQL and SPARQL 
allow to pose more complex queries including built- 
in predicates to filter solutions, advanced features such 
as negation or aggregates. In this section we will 
present an extension of the SPARQL 1 1 1 1 query lan- 
guage, called AnQL, that enables querying annotated 
graphs. We will begin by presenting some preliminaries 
on SPARQL. 

4.1. SPARQL 

SPARQL lfTTl is the W3C recommended query lan- 
guage for RDF. A SPARQL query is defined by a triple 
Q = (P, G, V), where P is a graph pattern and the 
dataset G is an RDF graph and V is the result form. We 
will restrict ourselves to SELECT queries in this work 
so it is sufficient to consider the result form V as a list 
of variables. 

Remark 4.1. Note that, for presentation purposes, we 
simplify the notion of datasets by excluding named 
graphs and thus GRAPH queries. Our definitions can 
be straightforwardly extended to named graphs and we 
refer the reader to the SPARQL W3C specification 4771/ 
for details. 



We base our semantics of SPARQL on the semantics 
presented by Perez et al. l37l . extending the multiset 
semantics to lists, which are considered a multiset with 
"default" ordering. RDF triples, possibly with variables 
in subject, predicate or object positions, are called triple 
patterns. In the basic case, graph patterns are sets of 
triple patterns, also called basic graph patterns (BGP). 
Let U, B, L be defined as before and let V denote a set 
of variables, disjoint from UBL. We further denote by 
var(P) the set of variables present in a graph pattern P. 

Definition 4.1 (Solution [11, Section 12.3.1]). Given 
a graph G and a BGP P, a solution for P over 
G is a mapping over a subset V of var(P), i.e., 
6 : V — > term(G) such that G \= P9 where P6 represents 
the triples obtained by replacing the variables in graph 
pattern P according to 6, and where G \= P6 means that 
any triple in P6 is entailed by G. We call V the domain 
of 6, denoted by dom(9). For convenience, sometimes 
we will use the notation 6 — {xi/ti,...,X„/t„} to 
indicate that 6(xi) — f,-, i.e., variable x,- is assigned to 
term t,. 

Two mappings 9\ and 62 are considered compatible if 
for all x e dom(0\)r\ dom(Qi), 6\(x) - 9 2 (x). We call the 
evaluation of a BGP P over a graph G, denoted IPJg, 
the set of solutions. 

Remark 4.2. Note that variables in the domain of 6 
play the role of distinguished variables in conjunctive 
queries and there are no non- distinguished variables. 

The notion of solution for BGPs is the same as the no- 
tion of answers for conjunctive queries: 

Proposition 4.1. Given a graph G and a BGP P, then 
the solutions of P are the same as the answers of the 
query q(var(P)) <— P (where var(P) is the vector of 
variables in P), i.e., ans(G, q) - [LP]] C . 

We present the syntax of SPARQL based on 11371 
and present graph patterns similarly. A triple pat- 
tern (s, p, o) is a graph pattern where s, o e ULV and 
p G UVpJ Sets of triple patterns are called Basic Graph 
Patterns (BGP). A generic graph pattern is defined in a 
recursive manner: any BGP is a graph pattern; if P and 
P' are graph patterns, R is a filter expression (see lfTTl ). 
then (P AND P'), (P OPTIONAL F), (P UNION P'), 
(P FILTER R) are graph patterns. As noted in Re- 



mark 4.1 we do not consider GRAPH patterns. 



10 We do not consider blank nodes in triple patterns since they can 
be considered as variables. 
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Evaluations of more complex patterns including FIL- 
TERS, OPTIONAL patterns, AND patterns, UNION 
patterns, etc. are defined by an algebra that is built on 
top of this basic graph pattern matching (see lfTTl[37l ). 

Definition 4.2 (SPARQL Relational Algebra). Let 

£l\ and £l 2 be sets of mappings: 



Definition 4.4 (OPTIONAL with FILTER Evaluation). 

Let P\ , P 2 be graph patterns R a FILTER expression. A 
mapping 9 is in [Pj OPTIONAL (P 2 FILTER R)J DS if 
and only if: 

• = 0i U 2 , s.t. 0i e IPila, Oi & IPrfa are 
compatible and R8 is true, or 



fij x Q 2 = {6»! U 2 I 01 £ fll, 02 6 

Qi y D 2 = {0 | e Q, or e Q 2 } 
Qi - Q. 2 = {0i € ilj I for all 2 e Q 2 
Qi ZH Q 2 = (Qi x Q. 2 ) y (fti - ^2) 



Q 2 , 0i awe/ 2 compatible} # 
1 ant/ 2 not compatible} 



9 e [Pile and V0 2 
compatible, or 



€ [[P 2 ]]g, an<^ 02 are not 



Definition 4.3 (Evaluation |37, Definition 2.2]). Lef 

t = (s, p,o) be a triple pattern, P,P\,P 2 graph patterns 
and G an RDF graph, then the evaluation |[-] c is 
recursively defined as follows: 

ltJ G = {9 I dom{9) = var(P) and G \= t9] 

IP 1 AND P 2 J G = IPiIg x IPiIg 

[Pi UNION P 2 J C = IPJc y IP 2 J G 
IPi OPTIONAL P 2 J G = IPJ G =h IP 2 ]]g 
IP FILTER P]] G = {0 € [P]] G I P0 is true } 

LetRbea F\LJE¥^\expression, u,v e VuUBL. P/ie 
valuation ofR on a substitution 9, written R9, is true if: 

(1) R = BOUND(v) with v e dom(8); 

(2) R = isBLANK(v) with v e dom(9) and 0(v) e B; 

(3) R = islRI(v) with v e dom(9) and 0(v) € U; 

(4) R = isLITERAL(v) with v e dom(9) and 9(v) € L; 

(5) R — (u — v) with u,ve dom(8) U UBL A 9{u) = 9{v); 

(6) R = (-.Pi) with R]9 is false; 

(7) R - (Pi V P 2 ) with Pi0 is true or P 2 is true; 

(8) R — (Pi A P 2 ) vWf/z R\9 is true and P 2 is frt<e. 



P0 yields an error ( denoted s), if: 



R 



(1) R = isBLANK(v),P = islRI(v), 
isLITERAL(v) and v <t dom(9) U T; 

(2) R = (u = v) with u t dom(0) UTorvg dom(9) U T; 

(3) P = (-,Pi)antiP 1 = e; 

(4) R = (Pi V P 2 ) ant/ (Pi0 ?t T and P 2 * T) and 
(P,0 = sorR 2 9 = s); 

(5) R = (PI AR2)andRi9 = s or P 2 = e. 

Otherwise R9 is false. 

In order to make the presented semantics compliant with 
the SPARQL specification [ 1 1|, we need to introduce an 
extension to consider unsafe FILTERs (also presented 
in El): 



"For simplicity, we will omit from the presentation FILTER.? such 
as comparison operators ('<', '>','^\'^'), data type conversion and 
string functions and refer the reader to 1 1 1 Section 1 1.3] for details. 



• 9 e [[Pile and V0 2 e [P 2 ]]c s.t. 9 and 9 2 are 
compatible, and P03 is false for 03 = U 2 . 

4.2. AnQL 

We are now ready to extend SPARQL for query- 
ing annotated RDF. We call the novel query language 
AnQL. For the rest of this Section we fix a specific an- 
notation domain, D = (L, ©, ®, X, T), as defined in Sec- 
tion EO 



4.2.1. Syntax 

We take inspiration on the notion of conjunctive an- 
notated queries discussed in Section |3.6| A simple 
AnQL query is defined - analogously to a SPARQL 
query - as a triple Q = (P, G, V, A) with the differences 
that (1) G is an annotated RDF graph; (2) we allow 
annotated graph patterns as presented in Definition 4.5 
and (3) A is the set of annotation variables taken from 
an infinite set A (distinct from V). We further denote 
by avar(P) the set of annotation variables present in a 
graph pattern P. 

Definition 4.5 (Annotated Graph Pattern). Let A be 

an annotation value from L or an annotation variable 
from A. We call A an annotation label. Triple pat- 
terns in annotated AnQL are defined the same way as in 
SPARQL. For a triple pattern t, we call t: A an anno- 
tated triple pattern and sets of annotated triple patterns 
are called basic annotated patterns (BAP). A generic 
annotated graph pattern is defined in a recursive man- 
ner: any BAP is an annotated graph pattern; if P and 
P' are annotated graph patterns, R is a filter expres- 
sion (see yl77T7). then (P AND P'), (P OPTIONAL P'), 
(P UNION P'), (P FILTER P) are annotated graph pat- 
terns. 

Example 4.1. Suppose we are looking for Ebay em- 
ployees during some time period and that optionally 
owned a car during that period. This query can be 
posed as follows: 
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SELECT ?p 71 ?C WHERE { 
(?p type ebayEmp) :?1 
OPTIONAL { (?p hasCar ?c):?l} 

} 

Assuming our example dataset from Figure ^extended 
with the following triples: 

(toivo, type, paypalEmp) : [2000, 2009] 
(toivo, hasCar, peugeot) : [1999, 2005] 
(toivo, hasCar, renault) : [2005, 2010] 

we will get the following answers: 

8, = {?/?/toivo,?//[2002,2009]} 

6 2 = (?p/toivo,?//[2002,2005],?c/peugeot) 

d 3 = (?p/toivo,?//[2005,2009],?c/renault) . 

The first answer corresponds to the answer in which 
the OPTIONAL pattern is not satisfied, so we get the 
annotation value [2002, 2009] that corresponds to the 
time toivo is an Ebay employee. In the second and third 
answers, the OPTIONAL pattern is also matched and, 
in this case, the annotation value is restricted to the time 
when Toivo is employed by Paypal and has a car. 

Note that - as we will see - this first query will return 
as a result for the annotation variable the periods where 
a car was owned. 

Example 4.2. A slightly different query can be the em- 
ployees of Ebay during some time period and optionally 
owned a car at some point during their stay. This query 
- which will rather return tlie time periods of employ- 
ment - can be written as follows: 

SELECT ?p 71 ?C WHERE { 
(?p type ebayEmp) : 71 
OPTIONAL { (?p hasCar 7c) : 712 
FILTER (712 < 71)} 

} 



Using the input data from Example 4.1 we obtain the 
following answers: 



H - 



?/>/toivo, ?// [2002, 2009]} 



9 2 = {?p/toivo,?Z/[2002,2009],?c/re«flM/f} 

In this example the FILTER behaves as in SPARQL by 
removing from the answer set the mappings that do not 
make the FILTER expression true. 



4.2.2. Semantics 

We are thus ready to define the semantics of AnQL 
queries by extending the notion of SPARQL BGP 
matching. As for the SPARQL query language, we are 
going to define the notion of solutions for BAP as the 
equivalent notion of answers set of annotated conjunc- 
tive queries. Just as matching BGPs against RDF graphs 
is at the core of SPARQL semantics, matching BAPs 
against annotated RDF graphs is the heart of the evalu- 
ation semantics of AnQL. 

We extend the notion of substitution to include a sub- 
stitution of annotation variables in which we do not al- 
low any assignment of an annotation variable to ± (of 
the domain D). An annotation value of ±, although it is 
a valid answer for any triple, does not provide any addi- 
tional information and thus is of minor interest. Further- 
more this would contribute to increasing the number of 
answers unnecessarily. 

Definition 4.6 (BAP evaluation). Let P be a BAP and 

G an annotated RDF graph. We define evaluation [[Pic 
as the list of substitutions that are solutions of P, i.e., 
IPJg = {Q I G |= 0(P)}, and where G |= 6(P) means that 
any annotated triple in 6(P) is entailed by G. 

As for SPARQL, we have: 

Proposition 4.2. Given an annotated graph G and a 
BAP P, the solutions of P are the same as the answers 
of the annotated query q(var(P)) <— P ( where var(P) is 
the vector of variables in P), i.e., ans(G, q) — [[Pic- 

For the extension of the SPARQL relational algebra to 
the annotated case we introduce - inspired by the defi- 
nitions in [37] - definitions of compatibility and union 
of substitutions: 

Definition 4.7 (®-compatibility). Two substitutions 6\ 
and 62 are ®-compatible ;/ and only if (i) 0\ and 02 
are compatible for all the non-annotation variables, 
i.e., 0\(x) — 02(x) for any non-annotation variable 
x e dom{0\ ) n dom(0 2 ); and (if) G\ (A) ® 2 (A) + _L for 
any annotation variable A e dom(0\) n dom(02). 

Definition 4.8 (®-union of substitutions). Given two 
®-compatible substitutions 0\ and 02, the ®-union of 0\ 
and 02, denoted 0\ ® 02, is as 0\ U 02, with the exception 
that any annotation variable A € dom{0\) n dom(02) is 
mapped to 0\(A) ® 02(A). 



This query also exposes the issue of unsafe filters, noted 
in [ 38 1 and we presented the semantics to deal with this 
issue in Definition 144] 



We now present the notion of evaluation for generic 
AnQL graph patterns. This consists of an extension of 
Definition |4~3| 
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Definition 4.9 (Evaluation, extends |37, Definition 2]). 

Let P be a BAP, P\,Pi annotated graph patterns, G 
an annotated graph and R a filter expression, then the 
evaluation [[-Ic ! - e -> set °f answerspj is recursively 
defined as: 

JP] C = {6 | dom(6) = var(P) and G \= 0(P)} 

IP i AND P 2 J G = {0! ®6 2 | B { e IP^ G ,0 2 e IP 2 ] G ,8 { and 
6 2 ^-compatible] 

IP i UNION P 2 ] G = IP^a U IP 2 J G 

IP i FILTER K$ G = {8 \ 6 e [PJc and R8 is true] 

IP, OPTIONAL Pz[R]Jg = 

[9 | and 6 meets one of the following conditions: 

1. e = 6i ®e 2 ife { e iPiJ G ,e 2 e ip 2 j g ,o { ande 2 ®- 

compatible, and R9 is true; 

2. 6 = 6 { e [[Pile and V6 2 e [P 2 J G such that 0, and 6 2 
^-compatible, R{6\ ®6 2 ) is true, and for all annotation 
variables A 6 dom(8i) Ci dom(8 2 ), 8 2 (A) < 8\(X); 

3. 6 = Ox 6 [Pile and V8 2 e IP 2 J G such that 8 { and 8 2 
^-compatible, R(8[ ® 6 2 ) is false ) 

Let R be a FILTER expression and x,y e AUL, in 
addition to the FILTER expressions presented in Def- 
inition \4.3\ we further allow the expressions presented 
next. The valuation ofR on a substitution 6, denoted R6 
is true if^\ 

(9) R = (x < y) with x,y e dom{9) U LA 0(x) < 0{y); 

(10) R — p(z) with p(z)8 — true if and only if p(8(z)) 
true, where p is a built-in predicate. 

Otherwise R6 is false. 

In the FILTER expressions above, a built-in predicate p 
is any «-ary predicate p, where p's arguments may be 
variables (annotation and non-annotation ones), domain 
values of D, values from UL, p has a fixed interpreta- 
tion and we assume that the evaluation of the predicate 
can be decided in finite time. Annotation domains may 
define their own built-in predicates that range over an- 
notation values as in the following query: 

Example 4.3. Consider our example dataset from Fig- 
ure^and that we want to know where chadHurley was 
working before 2005. This query can be expressed in 
the following way: 



12 Strictly speaking, we consider sequences of answers - note that 
SPARQL allows duplicates and imposes an order on solutions, cf. 
Section 14.2.3 1 below for more discussion - but we stick with set no- 
tation representation here for illustration. Whenever we mean "real" 
sets where duplicates are removed we write {. . .[distinct- 

13 We consider a simple evaluation of filter expressions where the 
"error" result is ignored, see II II Section 11.3] for details. 



SELECT ?city WHERE { 

(chadHurley worksFor ?comp):?l 
FILTER (before (?1, [2005])) 

} 

Remark 4.3. For practical convenience, we retain in 
[[■]c on ty "domain maximal answers". That is, let us 
define 9' < 6 if and only if (i) 6' + 0; (ii) dom(0) — 
dom(0'); (Hi) 0(x) — 0'(x) for any non- annotation vari- 
able x; and (iv) 0'(A) < 0(A) for any annotation variable 
A. Then, for any € [[PJc we remove any ff 6 [[PJc 
such that 0' < 0. 

Remark 4.4. Please note that the cases for the evalua- 
tion of the OPTIONAL are compliant with the SPAR- 
QL specification 4771/ . covering the notion of unsafe 
FILTERs as presented in A38V . However, there are 
some peculiarities inherent to the annotated case. More 
specifically case 2.) introduces the side effect that anno- 
tation variables that are compatible between the map- 
pings may have different values in the answer depend- 
ing if the OPTIONAL is matched or not. This is the 



behaviour demonstrated in Example 4.1 



The following proposition shows that we have a conser- 
vative extension of SPARQL: 

Proposition 4.3. Let Q = (P, G, V) be a SPARQL query 
over an RDF graph G. Let G' be obtained from G by 
annotating triples with T. Then [[PJc under SPARQL 
semantics is in one-to-one correspondence to [[Pic un- 
der AnQL semantics such that for any e [[Pic there is 
a ff e [[Pic with and 0' coinciding on var(P). 

4.2.3. Further Extensions of AnQL 

In this section we will present extensions of Defini- 
tion 4.9 to include variable assignments, aggregates and 
solution modifiers. These are extensions similar to the 



ones presented in Section 3.7 



Definition 4.10. Let P be an annotated graph pat- 
tern and G an annotated graph, the evaluation of an 
ASSIGN statement is defined as: 

\P ASSIGN f(z) AS zlc = {0 | 0, € IPJ G , 



where 

0[z/t] = 



U {z/t} ifz t dom(0) 

(0 \ {z/t'}) U [z/t] otherwise . 



Essentially, we assign to the variable z the value 
/(f?i(z)), which is the evaluation of the function /(z) 
with respect to a substitution 0\ e [[Pic- 
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Example 4.4. Using a built-in function we can retrieve 
for each employee the length of employment for any 
company: 

SELECT ?x ?y ?z WHERE { 
(?x worksFor ?y) : ?1 
ASSIGN length (71) AS ?z 

} 

Here, the length built-in predicate returns, given a set of 
temporal intervals, the overall total length of the inter- 
vals. 

Remark 4.5. Note that this definition is more general 
than " SELECT expr AS 7var" project expressions in 
current SPARQL 1.1 [39] due to not requiring that the 
assigned variable be unbound. 

We introduce the ORDERBY clause where the evalua- 
tion of a IP ORDERBY IxJg statement is defined as the 
ordering of the solutions - for any 8 e IPJc - accord- 
ing to the values of 9{lx). Ordering for non-annotation 
variables follows the rules in ifTTl Section 9.1]. 

Similarly to ordering in the query answering setting, 
we require that the set of values over which x ranges 
can be ordered and some linearisation method for posets 
may be applied if necessary, such as ll36ll . We can fur- 
ther extend the evaluation of AnQL queries with aggre- 
gate functions 

@ e {SUM, AVG, MAX, MINI, COUNT, ©,<g>) 

as follows: 

Definition 4.11. The evaluation of a GROUPBY state- 
ment is defined as^\ 

IP GROUPBY(w) @f(z) AS aJa = {0 I 0\ in IPJ G , 

6 = e 1 \*[a i /@ i f i (e i (Zi))]h\STMci 

where the variables a,- i var(P), Z; € var(P) and none 
of the GROUPBY variables w are included in the ag- 
gregation function variables z,-. Here, we denote by 
the restriction of variables in 6 to variables in w. Us- 
ing this notation, we can also straightforwardly intro- 
duce projection, i.e., sub-SELECTs as an algebraic op- 
erator in the language covering another new feature of 
SPARQL 1.1: 

[SELECT V {P}]\g = {0 1 0i inlPl a ,e = G l \,}. 



14 In the expression, @f(z) AS a is a concise representation of n 
aggregations of the form @,/i(z,) AS as,. 



Remark 4.6. Please note that the aggregator functions 
have a domain of definition and thus can only be applied 
to values of their respective domain. For example, SUM 
and AVG can only be used on numeric values, while 
MAX, MIN are applicable to any total order. Resolution 
of type mismatches for aggregates is currently being de- 
fined in SPARQL 1.1 H39V and we aim to follow those, 
as soon as the language is stable. The COUNT aggre- 
gator can be used for any finite set of values. The last 
two aggregation functions, namely © and ®, are defined 
by the annotation domain and thus can be used on any 
annotation variable. 

Remark 4.7. Please note that, unlike the current 
SPARQL 1.1 syntax, assignment, solution modifiers 
(ORDER BY, LIMIT) and aggregation are stand-alone 
operators in our language and do not need to be tied 
to a sub-SELECT but can occur nested within any pat- 
tern. This may be viewed as syntactic sugar allowing for 
more concise writing than the current SPARQL 1.1 H39V 
draft. 

Example 4.5. Suppose we want to know, for each em- 
ployee, the average length of their employments with 
different employers. Then such a query will be ex- 
pressed as: 

SELECT ?x ?avgL WHERE { 
(?x worksFor ?y):?l 
GROUPBY (?x) 

AVG (length (71) ) AS ?avgL 

} 

Essentially, we group by the employee, compute for each 
employee the time he worked for a company by means 
of the built-in function length, and compute the average 
value for each group. That is, if g — {{t, . . . , (f, f„)} 
is a group of tuples with the same value t for employee 
x, and value ti for y, where each length of employment 
forti is U (computed as length(-)), then the value ofavgL 
for the group g is (2,- Z,)/n. 

Proposition 4.4. Assuming tlie built-in predicates are 
computable infinite time, the answer set of any AnQL is 
finite and can also be computed infinite time. 

This proposition can be demonstrated by induction over 
all the constructs we allow in AnQL. 

4.3. Constraints vs Filters 

Please note that FILTERs do not act as constraints 
over the query. Given the data from our dataset example 
and for the following query: 
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SELECT ?11 ?12 WHERE { 

(?p type youtubeEmp) :?11 . 
(steveChen type youtubeEmp) : ?12 

} 

with an additional constraint that requires 111 to be "be- 
fore" 112, we could expect the answer 

{?/l/[2005, 2010], ?Z2/[201 1, 201 1]). 

This answer matches the following triples of our 
dataset: 

(steveChen, type, youtubeEmp) : [2005, 20 1 1 ] 
(chadHurley, type, youtubeEmp) : [2005, 2010] 

and satisfies the proposed constraint. However, we 
require maximality of the annotation values in the an- 
swers, which in general, do not exist in presence of con- 
straints. For this reason, we do not allow general con- 
straints. 

4.4. Union of annotations 

The SPARQL UNION operator may also introduce 
some discussion when considering shared annotations 
between graph patterns. Take for example the following 
query: 

SELECT ?1 WHERE { 

{ (chadHurley type youtubeEmp) :?1} 
UNION 

{ (chadHurley type paypalEmp) :?1} 

} 

and assume our dataset from Figure [T] as input. Con- 
sidering the temporal domain, the intuitive meaning of 
the query is "retrieve all time periods when chadHurley 
was an employee of Youtube or PayPal". In the case 
of UNION patterns the two instances of the variable 11 
are treated as two different variables. If the intended 
query would rather require treating both instances of 
the variable 11 as the same, for instance to retrieve the 
time periods when chadHurley was an employee of ei- 
ther Youtube or PayPal but assuming we may not have 
information for one of the patterns, the query should 
rather look like: 

SELECT ?1 WHERE { 

{ (chadHurley type youtubeEmp) :?11} 
UNION 

{ (chadHurley type paypalEmp) :?12} 
ASSIGN ?11 V ?12 as ?1 

} 

where V represents the domain specific built-in predi- 
cate for union of annotations. 



5. On primitive domains and their combinations 

In this section we discuss some practical issues re- 
lated to (i) the representation of the temporal domain 
(Section |5.1) ; (ii) the combination of several domains 
into one compound domain (Section 5.2 1; (iii) the inte- 
gration of differently annotated triples or non-annotated 
triples in the data or query (Section[5~3|. 



5.1. Temporal issues 

Let us highlight some specific issues inherent to the 
temporal domain. Considering queries using Allen's 
temporal relations ||40l (before, after, overlaps, etc.) as 
allowed in B, we can pose queries like "find persons 
who were employees of PayPal before toivo". This 
query raises some ambiguity when considering that per- 
sons may have been employed by the same company at 
different disjoint intervals. We can model such situa- 
tions - relying on sets of temporal intervals modelling 
the temporal domain. Consider our dataset triples from 
Figure [Tjextended with the following triple: 

(toivo, type, paypalEmp): {[1999, 2004], [2006, 2008]) 

Tappolet and Bernstein 0] consider this triple as two 
triples with disjoint intervals as annotations. For the fol- 
lowing query in their language tSPARQL: 

SELECT ?p WHERE { 

[?sl,?el] ?p type youtubeEmp . 
[?s2,?e2] chadHurley type youtubeEmp . 
[?sl,?el] time : intervalBef ore [?s2,?e2] 

} 

we would get chadHurley as an answer although toivo 
was already working for PayPal when chadHurley 
started. This is one possible interpretation of "before" 
over a set of intervals. In AnQL we could add different 
domain specific built-in predicates, representing differ- 
ent interpretations of "before". For instance, we could 
define binary built-ins (i) beforeAny(?Al, 1A2) which is 
true if there exists any interval in annotation ?A1 before 
an interval in ?A2, or, respectively, a different built-in 
beforeAII(?Al, ?A2) which is only true if all intervals in 
annotation ?A1 are before any interval in ?A2. Using the 
latter, an AnQL query would look as follows: 

SELECT ?p WHERE { 

( ?p type youtubeEmp) :?11 . 
(toivo type youtubeEmp) :?12 . 
FILTER (beforeAll (?11, ?12) ) 

} 
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This latter query gives no result, which might comply 
with people's understanding of "before" in some cases, 
while we also have the choice to adopt the behaviour 
of by use of beforeAny instead. 

More formally, if we consider an Allen relation r that 
holds between individual intervals, we can define a re- 
lation r over sets of intervals in five different ways: 

Definition 5.1. Let T\ and Ti be two non-empty sets of 
disjoint intervals. We define the following relations: 

• r 33 = {<Ti,T 2 > | 3/i e T u 3t 2 6 T 2 such that (t u t 2 ) e rj; 

• r 3V = {<r,,r 2 > | 3f, e T u V/ 2 e T 2 such that {t { ,h_)e r); 

• r V3 = {{T u T 2 ) I V/, 6 T u 3/ 2 e T 2 such that (t u t 2 ) e rj; 

• f 3VA v3 = ? 3 v n r V3 ; 

• r w = l(T u T 2 ) | V/, e T u V/ 2 6 T 2 such that (t x , t 2 ) e r). 

These relations are illustrated by the following exam- 
ples, taking the Allen relation before: 

Example 5.1. Figure H\is an example of time intervals 
that make each of the relations introduced in Defini- 



tion 5.1 true for the before Allen relation. 



It should be noticed that if one stick to one choice of 
quantifier, the resulting set of relations does not form a 
proper relation algebra. Indeed, it is easy to see that, 
in the first 3 cases, the relations are not disjoint. For 
instance, two sets of intervals can be involved in both 
a before33 and an after 33 relation. On the other hand, 
the last 4 cases are incomplete, that is, there are pairs of 
sets of intervals that cannot be related with any of the 

?3V, fv3, fyv Or T3VAV3- 




r 3VAV3 



r VV 



Figure 3: Hierarchy of relations. 

5.2. Extensions to multiple domains 

Since annotations in our framework can range over 
different domains in different applications, one may 
be interested in combining several annotation domains 



such as annotating triples with a temporal term and a 
truth degree or degree of trust, etc. In fl2l . we pro- 
posed an approach for easily combining multiple do- 
mains, based on the pointwise extension of domain op- 
erators to a product of domains. Here, we criticise this 
approach and propose a revised approach that better fits 
the intuition. 

5.2.1. Former approach and criticism 

The approach described in lfl2l is the following. In 
general, assuming having domains D\, . . . ,D n , where 
= (L,, ®/, ±i, T,}, we may build the domain D = 
Di X . . . xD n = (L,®, ®, X, T), where L = L\X . . .xL„, 
± = (±i, . . . , _L„), T = (Ti, . .., T„) and the meet and 
join operations ® and © are extended pointwise to L, 
e.g., (A l ,...,A n )S>{A\,..., A„}' =(A 1 ®A' 1 ,..., A„®A' n ). 
For instance, 

(SkypeCollab, SC, EbayCollab) : ([2009, 201 1], 0.3) 

may indicate that during 2009-2011, the collaborators 
of Skype were also considered collaborators of Ebay to 
degree 0.3 (here we combine a temporal domain and a 
fuzzy domain). The interesting point of our approach 
is that the rules of the deductive systems need not be 
changed, nor the query answering mechanism (except 
to provide the support to compute ® and © accordingly). 

The problem with this approach is that the annota- 
tions are dealt with independently from each others. As 
a result, e.g., the truth value 0.3 does not apply to the 
time range [2009, 201 1]. This problem is made very ap- 
parent when one observes the unexpected consequences 
of our © operator on such a combination: 



(SkypeCollab, SC, EbayCollab): ([2005,2009], 1) 
(SkypeCollab, SC, EbayCollab) : ([2009, 201 1], 0.3) 

Applying the point-wise operation ffi, this leads to the 
conclusion: 

(SkypeCollab, SC, EbayCollab): ([2005,2011], 1) 

This defies the intuition that, between 2005 and 2009, 
Skype collaborators where also Ebay employees (col- 
laborate to degree 1), but from 2009 to 201 1 Skype col- 
laborators were Ebay collaborators to the degree 0.3. 
The pointwise aggregation does not follow this intu- 
ition and levels up everything. In the example above, 
we would like to say that the fuzzy value itself has 
a duration, so that the temporal interval corresponds 
more to an annotation of a quadruple. Note that this 
problem is not specific to the combination of time and 
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Figure 2: Temporal relations 



fuzziness. We observe a similar issue when combining 
provenance, for instance, with other domains: 

(skypeEmp, SC, ebayEmp): ([2005, 2009], wikipedia) 
(skypeEmp, SC, ebayEmp): ([1958, 2012], wrong) 

Using a point-wise aggregation method, the result 
would be: 

(skypeEmp, SC, ebayEmp) : ([1958, 2012], 

wikipedia V wrong) 

which entails: 

(skypeEmp, SC, ebayEmp): ([1958, 2012], wikipedia) 

Again, the problem is that provenance here does not de- 
fine the provenance of the temporal annotation and the 
temporal annotation is not local to a certain provenance. 

In order to match the intuition, we devise a systematic 
construction that defines a new compound domain out 
of two existing domains. 

5.2.2. Improved Formalisation 

In this section, we propose a generic construction that 
builds an annotation domain by combining two prede- 
fined domains in a systematic way. To achieve this, we 
will assume the existence of two annotation domains 
D x = (Li,0i,®i,±i,Ti> and D 2 = (la, © 2 , ®2, J- 2, T 2 > 
which will be instantiated in examples with the tem- 
poral domain for D\ (abbreviated D t ) and either the 
fuzzy domain (D f ) or the provenance domain (D p ) for 



£>2. We denote the temporal and fuzzy combination 
time+ fuzzy, and the temporal and provenance combina- 
tion time + provenance. 

Intuition and desired properties. In our former ap- 
proach, we remarked that some information is lost in the 
join operation. Considering time+fuzzy, we see that the 
join should represent temporary changes in the degree 
of truth of the triple. Yet, it is clear that representing 
such changes cannot be done with a simple pair (inter- 
vals, value). So, as a first extension of our previous naive 
solution, we suggest using sets of pairs of primitive an- 
notations, as exemplified below. 

(SkypeCollab, SC, EbayCollab) : {([2005, 2009], 1), 

([2009, 2011], 0.3)} 

Starting from this, we devise an annotation domain that 
correctly matches the intuitive meaning of the com- 
pound annotations. The annotated triple above can be 
interpreted as follows: for each pair in the annotation, 
for each time point in the temporal value of the pair, 
the triple holds to at least the degree given by the fuzzy 
value of the pair. The time+provenance combination is 
interpreted analogously, except that the triple holds (at 
least) in the context given by the provenance value of 
the pair. 

This interpretation of the compound annotations im- 
plies that multiple sets of pairs can convey the ex- 
act same information. For example, the following 
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time+fuzzy annotated triples are equivalent: 

(SkypeCollab,SC,EbayCollab) : {{[2005, 2009], 1» 
(SkypeCollab, SC, EbayCollab) : {([2005, 2009], 0.3), 

([2005,2009], 1» 

From this observation, we postulate the following de- 
sired property: 

Property 1. For all x e L\,y,y' e L2 and for all pdf 

triples t, t : {{x,y), {x,y')\ is semantically equivalent to 
t : {<*,)> ©2 />}• 

Consequently, it is always possible to assign a unique 
element y e L2 to a given element of L\. Thus, an 
arbitrary set of pairs in L\ x L2 is equivalently repre- 
sentable as a partial mapping from L\ to L^. Addition- 
ally, given a certain time interval, we can easily com- 
pute the maximum known degree to which a time+fuzzy 
annotated triple holds. For instance, with the annota- 
tion {([2005, 2009], 0.3), ([2008, 201 1], 1», we can as- 
sign the degree 1 to any subset of [2008,2011]; the de- 
gree 0.3 to any subset of [2005, 2009] which is not con- 
tained in [2008, 2009]; the degree to any other tempo- 
ral value. 

This remark justifies that we can consider a com- 
pound annotation A as a total function from L\ and 
Li- From now on, whenever A is a finite set of 
pairs, we will denote by A the function that maps ele- 
ments of L\ to an element of L2 that, informally, mini- 
mally satisfies the constraints imposed by the pairs in 
A. This is formalised below. For instance, if A = 
{([2005, 2009], 0.3), ([2008, 201 1], 1», then: 

C 1 ifxc [2008,2011] 
A(x) = \ 0.3 if x c [2005, 2009] and x £ [2008, 2009] 
\ otherwise 

Whereas in this example, for the time+fuzzy domain 
the value of A for a particular interval seems to follow 
quite intuitively, let us next turn to the less obvious com- 
bination of time+provenance. Here, we postulate that 
the following triples 

t:{ ([2005, 2009], wikipedia), 

([2008, 2011], wrong)) 

r:{ ([2005, 2007], wikipedia), 

([2007, 2009], wikipedia), 
([2008, 2011], wrong)) 

r:{ ([2005, 2008], wikipedia), 

([2008, 2009], wikipedia V wrong), 
([2009, 2011], wrong)) 

represent in fact equivalent annotations. Let us 
check the intuition behind this on a particular interval, 



[2005, 2009], which for the first triple has unambigously 
associated the provenance value wikipedia. Consid- 
ering the second annotated triple, we observe that the 
provenance wikipedia can likewise be associated with 
the interval [2005, 2009] because this provenance is as- 
sociated with two intervals that - when joined - cover 
the time span [2005,2009]. In the case of the last an- 
notated triple, the provenance wikipediaVwrong means 
that the triple holds in wikipedia as well as in wrong 
(notice that x V y means that the assertion holds in x 
and in y likewise, see Section |3.4.3| for details). In- 
tuitively, we expect for the last triple that the prove- 
nance associated with the joined interval [2005,2009] 
is obtained from applying the meet operator over the 
respective provenance annotations wikipedia (for the 
partial interval [2005,2008]) and wikipedia V wrong 
(for the partial interval [2008, 2009]), i.e., (wikipedia V 
wrong) A wikipedia which - again - is equivalent to 
wikipedia in the provenance domain. Besides, consid- 
ering now the interval [2005,2011], the triple is true in 
either wikipedia.org or wrong, which is modelled as 
(wikipedia A wrong) in the provenance domain. Let us 
cast this intuition into another property we want to en- 
sure on the function A: 

Property 2. Given a set of annotation pairs A, for all 
Xn € Li whenever 3J C A with xn <i ®> x, we have 

A(x ) >i ®\,y. 

Our goal in what follows is to characterise the set of 
functions associated with a finite set of pairs, that is {A | 
A c L\ x L2}, in a manner such that Property [T] and 
Property [2] are satisfied. 

Formalisation. As mentioned before, a compound an- 
notation can be seen as a function that maps values of 
the first domain to values of the second domain. In or- 
der to get the desired properties above established, we 
restrict this function to a particular type of functions that 
we call quasihomomorphism because it closely resem- 
bles a semiring homomorphism. 

Definition 5.2 (Quasihomomorphism). Let f be a 

function from D\ — (Li,ffii,®i,±i,Ti) to D2 — 
(La, ©2, ®2> -J-2, T2). / is a quasihomomorphism of do- 
mains if and only if for all x,y € L\: (i) f(x ©1 y) >2 
fix) ® 2 fiy) and (ii) fix ® x y) >2 fix) ffi 2 fiy). 

We now use quasihomomorphisms to define - on an ab- 
stract level - a compound domain of annotations. 

Definition 5.3 (Compound annotation domain). 

Given two primitive annotation domains D\ and D2, 



21 



the compound annotation domain of D\ and D2 is the 
tuple {L\2, ®12>®12> -Ll2> T12) defined as follows: 

• L\i is the set of quasihomomorphisms from D\ to 

D 2 ; 

• ±12 is the function defined such that for all x e L\, 
±i 2 (x) = ± 2 ; 

• T12 is the function defined such that for all x G L\, 
T12W = T 2 ; 

• for all A, p. e L\2, for all x e L\, (A ©12 p)(x) — 
A{x) © 2 n{x); 

• for all A,p e Z42, for all x e L\, (A ®i2 p)(x) — 
A(x) ® 2 n{x); 

This definition yields again a valid RDF annotation do- 
main, as stated in the following proposition: 

Proposition 5.1. (Z42, ©12, ®12, -L12, T12) is an idempo- 
tent, commutative semiring and ©12 is T ' \2-annihilating. 

Quasihomomorphisms are abstract values that may not 
be representable syntactially. By analogy with XML 
datatypes [41], we can say that they represent the value 
space of the compound domain. In the following, we 
want to propose a finite representation of some of these 
functions. Indeed, as we have seen in the examples 
above, we intend to represent compound annotations 
just as finite sets of pairs of primitive annotations. Thus, 
continuing the analogy, the lexical space is merely con- 
taining finite sets of pairs of primitive annotation val- 
ues. To complete the definition, we just have to define 
a mapping from such finite representation to a corre- 
sponding quasihomomorphism. That is, we have to de- 
fine the lexical-to-value mapping. 

Consider again the (primitive) domains D\ and D2 
and let A c L\ x L2 be a finite set of pairs of primitive 
annotations. We define the function A : D\ — > D2 as 
followsEl 

Vz e L u A(z) = lub{ ® 2 y | J c A and z <i ® 1 x] . 

(x,y) e J (x,y) e / 

Theorem 5.2. If A c L\ x L2 is a finite set of pairs of 
primitive annotations, then A is a quasihomomorphism. 

The proof is mostly a sequence of manipulation of no- 
tations with little subtlety, so we refer the reader to Ap- 



pendix Appendix A for details. 



Now, we know that we can translate an arbitrary finite 
set of pairs of primitive annotations into a compound an- 
notation. However, using arbitrary sets of pairs is prob- 
lematic in practice for two reasons: (1) several sets of 
pairs have equivalent meaning^] that is, the function 
induced by the two sets are identical; (2) the approach 
does neither gives a programmatic way of computing 
the operations (®i2,©i2) on compound annotations, nor 
gives us a tool to finitely represent the results of these 
operations. 

Thus, we next turn towards how to choose a canonical 
finite representative for a finite set annotation pairs. To 
this end, we need a normalising function : 2 L,xL2 — > 
2 LlXL - such that for all A, A' c L x x L 2 ,A = A 7 if and 
only if N(A) = N(A'). This will in turn also allow us 
to define the operations ©12 and ®i2 over the set of nor- 
malised annotations. 

Normalisation. We propose a normalisation algorithm 
based on two main operations: 

Saturate: informally, the saturate function increases 
the size of a set of pairs of annotations by adding 
any redundant pairs that "result from the applica- 
tion of ® and © to values existing in the initial 
pairs"; 

Reduce: takes the output of the saturation step and re- 
moves "subsumed" pairs. 

In particular, the Saturate algorithm is adding pairs of 
annotations to the input such that in the end, all primi- 
tive annotations that can be produced by the use of ex- 
isting values and operators ® and © appear in the output. 
The algorithm for Saturate, Reduce and Normalise are 
given in Algorithm [T] Algorithm [2] and Algorithm [3] re- 
spectively. 

Algorithm 1 Saturate(A) 
Input: AcL]XL 2 finite 
Output: Saturate(A) 

R :=0; 

for all Xc2"do 
R :^RU{(f 

R :=RU{{f 
return R; 

If the operations ®j and ®2 are idempotent, Algorithm[T] 
ensures that given a value x e L\ that is the result of us- 
ing operators ®\ and ©i on any number of primitive an- 
notations of L\ appearing in A, then there exists y e L2 



1 x ^ 2 

' 7eJ( (x,y) 



2 W 2 )}; 



15 Note that as D2 is an annotation domain, the lub operation is well 
defined. 



16 Particularly, we note that there can still be an infinite set of finite 
representations of the same compound annotation. 
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such that (x,y) exists in the output of Saturate. Simi- 
larly, given y e L2 that can be obtained from combina- 
tions of values of L2 appearing in A and operators ®2 
and ©2> then there exists x e L2 such that (x, y) exists in 
the output of Saturate. 

Example 5.2. Consider the following time+fuzzy anno- 
tation: 

{{[2000, 2005], 0.7), {[2002, 2008], 0.5)} 

Application of the function saturate gives the following 
result: 

{([2000, 2005], 0.7), ([2002, 2008], 0.5), 
([2000, 2008], 0.35), ([2002, 2005], 0.7), 
([2000, 2005], 0.49), ([2002, 2005], 0.49)) 

Now we notice that this can introduce redundant infor- 
mation, which should be eliminated. This is the goal of 
the function Reduce which is defined by Algorithm|2] 

Algorithm 2 Reduce(A) 

Input: A c L\ x L2 finite and saturated 

Output: Reduce(A) 

while 3(x,y) e A,3(x',y'} e A \ {(x,y)} such that 
x <i x' andy <i y' do 

R:=R\{(x,y)}; 
while 3(x,y) e A such that x = ±1 or y = ±2 do 

R:=R\{(x,y)}; 
return R; 



Note that the pair ([2001, 2006], wikipedia V wrong) is 

introduced by Line 6 of Algorithm^and is not discarded 
during the reduction phase. 

The following property can be shown: 

Proposition 5.3. If D x - (L|,$i,®i, l|,Ti) is a lat - 
tice then, for all A £L,x L 2 finite, A = Normalise(A). 

Notice that we must impose that the first primitive do- 
main of annotation is a lattice for the normalisation to 
work, that is, we need that z <\ x and z <\ y iff 
z <\ x ®i y. Details of the proof can be found in Ap- 
pendix |Appendix A| 

The following theorem shows that the normalisation 
is actually unique up to equivalence of the correspond- 
ing functions. 

Theorem 5.4. If D\ = (Li,ffli,®i,±i,Ti) is a lattice 
then, for all A,B c L( x L2 finite then A — B <=> 
Normalise(A) = Normalise(B). 

Again, to improve readability, we put the proof in Ap- 



pendix Appendix A 



Operations on normalised annotations. We can now 
present the operations ©12 and ®i2 on normalised finite 
sets of pairs. 

• A © 12 B = Normalised u B); 

• A ® 12 B = Normalise({<x ®i x',y ® 2 /) | 
(x,y), (x',y'} e A x B}). 



Example 5.3. Considering Example 5.2 the output of 
the Saturate algorithm above, the Reduce function 
gives the following result:: 



Finally, with the proposed representation and opera- 
tions, we devised a systematic approach to compute 
combination of domains using existing primitive do- 
mains. This implies that an implementation would not 
{([2000, 2005], 0.7), ([2002, 2008], 0.5), ([2000, 2008], 0.35)) need to include operators that are specific to a given 

combination, as long as programmatic modules exist for 
the primitive annotation domains. 



Algorithm 3 Normalise(A) 
Input: A c Lj x L 2 finite 
Output: Normalise(A) 
return Reduce(Saturate(A)); 



Example 5.4. Consider 
time+provenance annotation: 



the 



following 



{([1998, 2006], wikipedia), <[2001, 201 1], 
which normalises to: 



wrong x 



)! 



5.2.3. Discussion 

Our definition of a compound annotation domain 
is, to the best of our knowledge, a novelty in set- 
tings involving annotations: previous work on annotated 
RDF 11511121 . annotated logic programmes [16] or an- 
notated database relations ifTTIl have not addressed this 
issue. We present in this section some considerations 
with respect to the chosen approach. 



1. The normalisation algorithm is not optimised and 
would prove inefficient if directly implemented "as 
{([1998,2011], wikipedia A wrong), ([1998, 2006], wikipedia), is " In t hi s partj W e have provided a working so- 

([2001,2011], wrong), ([2001, 2006], wikipedia V wrong)} lution for normalising compound annotation as a 
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mere proof of existence of such a solution. By 
observing the examples that we provide for the 
time+fuzzy domain, it seems that the cost of nor- 
malising can be reduced significantly with appro- 
priate strategies. 



2. As indicated by Theorem 5.4 we only ensure that 
the normalisation is feasible for a combination of 
annotation domains where at least one is a lat- 
tice. Whether a normalisation function exists in 
the more general case of two commutative, idem- 
potent, T-annihilating semirings is an open ques- 
tion. 

3. The method we provide defines a new domain of 
annotation in function of existing domains, such 
that it is possible to reason and to query triples an- 
notated with pairs of values. This does not mean 
that it is possible to reason with a combination of 
triples annotated with the values of the first do- 
main, and triples annotated with values of the sec- 
ond domain. For instance, reasoning with a combi- 
nation of temporally annoted triples and fuzzy an- 
notated triples does not boil down to reasonning 
over time+fuzzy-annotated triples. The next sec- 
tion discusses this issues and how non-annotated 
triples can be combined with annotated triples. 

5.3. Integrating differently annotated triples in data 
and queries 

While our approach conservatively extends RDFS, 
we would like to be able to seamlessly reason with 
and query together annotated triples and non-annotated 
triples. Since non-annotated triples can be seen as 
triples annotated with boolean values, we can gener- 
alise this issue to reasoning and querying graphs anno- 
tated with distinct domains. For instinct, let us assume 
that a dataset provides temporally annotated triples, an- 
other one contains fuzzy-annotated triples and yet an- 
other is a standard RDF dataset. We want to provide a 
uniform treatment of all these datasets and even handle 
the merge of differently annotated triples. Moreover, we 
expect to allow multiple annotation domains in AnQL 
queries. 

5.3.7. Multiple annotation domains in the data 
Consider the following example: 

(chadHurley, type, googleEmp) : [2006, 2010] 
(chadHurley,type, googleEmp): 0.7 
(googleEmp, SC, Person): 0.97 



We can assume that the subclass relation has been de- 
termined by ontology matching algorithms, which typi- 
cally return confidence measures in the form of a num- 
ber between and 1. Consider as well the following 
example queries: 

Example 5.5. 

SELECT ?a WHERE { 

(chadHurley type googleEmp) : ?a 

} 

Example 5.6. 

SELECT ?a WHERE { 

(chadHurley type Person) : ?a 

} 

We propose two alternative approaches to deal with 
multiple annotation domains. The first one simply seg- 
gregates the domains of annotations, such that no infer- 
ences are made across differently annotated triples. The 
second one takes advantage of the compound domain 
approach defined in SectionlJS] 



Seggregation of domains. With this approach, dis- 
tinct domains are not combined during reasoning, 
such that the first annotated triple together with the 
third one would not produce new results. The query 



from Example 5.5 would have the following answers: 
{?a/[2006,2010]},{?a/0.7}. The query from Exam- 
ple 5.6 would have the answer {?a/0.679} (under prod- 



uct t-norm ®). 

The main advantage is that query answering is kept 
very straightforward. Moreover, it is possible to com- 
bine different annotation domains within the query by 
simply joining results from the seggregated datasets. 
The drawback is that reasoning would not complement 
non-annotated knowledge with annotated one and vice 
versa. 

Using compound domains. The principle of this 
approach is to assume that two primitive annotation 
values from distinct domains actually represent a pair 
with an implicit default value for the second element. 
The default value can be domain dependent or generic, 
such as using T or 1 systematically. An example 
of domain specific default is found in [2] where the 
value [-oo, Now] is used to fill the missing annotations 
in standard, non-annotated RDF. It can be noticed 
that using 1 as a default would boil down to having 
seggregated datasets, as in the previous approach. The 
use of T has the advantage of being generic and allows 
one to combine knowledge from differently annotated 
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sources in inferences. So, the query in Example 5.5 has 
answer {?a/{<[2006, 2010], 1), <[-oo, +oo], 0.7)}}, 
while Example |5.6| has the answer 
{?a/{<[2006, 2010], 0.97), <[-oo,+oo], 0.679)}}. 

The main advantage is the possibility to infer new 
statements by combining various annotated or non- 
annotated triples. The drawbacks are that (i) our com- 
bination approach is, so far, limited to the case where 
one domain is a lattice; (ii) if triples with a new annota- 
tion domain are added, then it adds one dimension to the 
answers, which obliges to recompute existing answers; 
(iii) the combination of more than two domains may be 
particularly complex and possibly non-commutative. 

5.3.2. Multiple annotation domains in the query 

When dealing with multiple domains in the query, we 
face a similar choice as in the data, but we are also of- 
fered the option to replace the default value with a vari- 
able. If seggregation of domains has been chosen, then 
distinct domains in the query are only used to match the 
corresponding data, but it is still possible to combine the 
results from differently annotated sources. For instance: 

SELECT ?e ?C ?t ?f WHERE { 

{ (?c sc ebayEmp) : ?f } 
UNION 

{ (?e type ?c) : ?t . } 
FILTER{ ?t <, [2005,2011] AND ?f < f 0.5} 

This query can be executed even on a dataset that 
does not include fuzzy value. The fuzzy-annotated 
triple pattern can be simply ignored and the temporally 
annotated pattern evaluated. 

In the case of the second approach using compound 
domains, the choices are as follow: 

1 . add a single fresh annotation variable for all triples 
in the query that are missing a value for an annota- 
tion domain; or 

2. add a different fresh annotation variable for each 
triple in the query; or 

3. add a constant annotation such as T to all missing 
annotation values. 

In later discussions, we will use the meta-variable ©o 
to represent the default value of domain D assigned to 
annotations in the query triples. 

Example 5.7. For instance, if we again consider the 
query (excluding the annotation variables) and input 



SELECT ?p ?c WHERE { 

(?p type : ebayEmp) 
OPTIONAL { (?p :hasCar ?c) } 

} 

Now, given the above three approaches for transforming 
this query we would get the following answers: 



Approach 1 


?p/toivo 

?p/toivo ?c/peugeot 
?/?/toivo ?c/renault 


Approach 2 


?p/toivo ?c/peugeot 
?p/toivo ?c/renault 


Approach 3 






data from Example 4.1 the query would look like: 



5.3.3. Querying multi-dimensional domains 

Similarly to the discussion in the previous subsection, 
we can encounter mismatches between the Annotated 
RDF dataset and the AnQL query. In case the AnQL 
query contains only variables for the annotations, the 
query can be answered on any Annotated RDF dataset. 
From a user perspective, the expected answers may dif- 
fer from the actual annotation domain in the dataset, 
e.g., the user may be expecting temporal intervals in 
the answers when the answers actually contain a fuzzy 
value. For this reason some built-in predicates to deter- 
mine the type of annotation should be introduced, like 
isTEMPORAL, isFUZZY, etc. 

If the AnQL query contains annotation values and the 
Annotated RDF dataset contains annotations from a dif- 
ferent domain, one option is to not provide any answers. 
Alternatively, we can consider combining the domain of 
the query with the domain of the annotation into a multi- 
dimensional domain, as illustrated in the next example. 

Example 5.8. Assuming the following input data: 

(chadHurley, type, youtubeEmp) : chad 
When performing the following query: 

SELECT ?p ?c WHERE { 

(?p type 7c) : [2009, 2010] 

} 

we would interpret the data to the form: 

(chadhurley, type, youtubeEmp) : (chad, ^temporal) 
while the query would be interpreted as: 

SELECT ?p ?c WHERE { 

(?p type ?C) .-(©provenaace, [2009, 2010]) 

} 

where Q t emporai an d ©provenance ore annotations corre- 
sponding to the default values of their respective do- 
mains, as discussed in Section |5.3.2| The semantics of 
combining different domains into one multi-dimensional 
domain has been discussed in Section\52\ 
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6. Implementation Notes 

Our prototype implementation is split into two dis- 
tinct modules: one for that implements the Annotated 
RDFS inferencing and the second module is an imple- 
mentation of the AnQL query language that relies on 
the first module to retrieve the data. Our prototype im- 
plementation is based on SWI-Prolog's Semantic Web 
library [42] and we present the architecture of the im- 
plementation in Figure [4] 

Our Annotated RDFS module consists of a bottom-up 
reasoner used to calculate the closure of a given RDF 
dataset 1). The variable components comprise 2) the 
specification of the given annotation domain; and 3) 
the ruleset describing the inference rules and the way 
the annotation values should be propagated. For 1) we 
do not suggest a special RDF serialisation for tempo- 
ral triples but rely on existing proposals using reifica- 
tion 121 ■ Annotation domains in 2) are to be specified 
by appropriate lattice operations and describing default 
annotations for non-annotated triples. 

The rules in 3) are specified using a high-level lan- 
guage to specify domain independent rules that ab- 
stracts from peculiarities of the reification syntax. For 
example the following rule provides subclass inference 
in the RDFS ruleset: 

rdf (0, rdf:type, C2, V) <== 

rdf(0, rdf:type, CI, VI), 

rdf (CI, rdf s : subClassOf , C2, V2), 

infimum(Vl, V2, V) . 

2) and 3) are independent of each other: it is possible to 
combine arbitrary rulesets and domains (see above). 

The AnQL module also implemented in Prolog re- 
lies on the SPARQL implementation provided by the 
ClioPatria Semantic Web Server^] For the AnQL im- 
plementation, the domain specification needs to be ex- 
tended with the grammar rules to parse an annotation 
value and any built-in functions specific to the domain. 

More information and downloads of the prototype 
implementation can be found at |http : //anql . deri . | 

6.1. Implementation of specific domains 

For example, for the fuzzy domain the default value 
is considered to be 1 and the ® and © operations are, 
respectively, the min and max operations. The AnQL 
grammar rules consist simply of calling the parser pred- 
icate that parses a decimal value. 



AnQL 



^http : / / www . swi-prolog . org/web/ClioPatria/ 
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Figure 4: Annotated RDF implementation schema 

As for the temporal domain, we are representing 
triple annotations as ordered list of disjoint time inter- 
vals. This implies some additional care in the construc- 
tion of the ® and © operations. For the representation 
of -00 and +00 we are using the inf and sup Pro- 
log atoms, respectively. Concrete time points are rep- 
resented as integers and we use a standard constraint 
solver over finite domains (CLPFD) in the ® and © op- 
erations. The default value for non-annotated triples is 
[ inf, sup] . The ® operation is implemented as the re- 
cursive intersection of all the elements of the annotation 
values, i.e., temporal intervals. The © operation is han- 
dled by constructing CLPFD expressions that evaluate 
the union of all the temporal intervals. Again, the AnQL 
grammar rules take care of adapting the parser to the 
specific domain and we have defined the domain built- 



in operations described in Section 5.1 



6.2. Use-case example: Sensor Data 

As a use-case for Annotated RDF and AnQL, we 
present the scenario of exposing sensor readings as RDF 
data. Representing sensor data as RDF, more specifi- 
cally as Annotated RDF, enables not only a precise and 
correct representation for sensor data but also the possi- 
bility of interlinking the data with other existing sources 
on the Web. 

Consider the scenario in which each person is as- 
signed a sensor tag (mode) to use in a building that is 
equipped with several sensor base-stations (that will be 
responsible for recording the presence of tags). When- 
ever sensor modes are detected in the proximity of a 
base-station, sensor readings are created. Normally this 
sensor reading will contain the time of the reading, the 
identifier of the base-station and the tag. For our exam- 
ple we used datasets publicly available, that represent 
movements of persons in a conference. For our test pur- 
poses we used a subset of the dataset available at http : 
|//people . openpcd. org/meri/openbeacon/| 
|sputnik/data/24c3/| with a one hour time frame. 
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For the specific Annotated RDF domain, we can take 
as starting point the temporal domain, where each triple 
is annotated with a temporal validity. Conceptually, a 
temporally annotated triple would look like the follow- 
ing: 

(tag4302 locatedln rooml03) : 

[2010-07-2 8T16:52:00Z,2010-07-2 8T14:59:00Z 

stating that the tag represented by the URI tag4302 was 
in the room identified by rooml(93 during the specified 
time period. For the URIs we can define a domain vo- 
cabulary or rely on an already existing vocabulary. 

Since a sensor mode can, at any time, be discovered 
by several base-stations the issue arrises of how to detect 
which base-station it is closer to. This can be viewed as 
a data cleanup process that can be achieved as a post- 
processing step over the stored data. In our specific 
experiment, the sensor readings were of the following 
format: 

2010-10-11 14:57:51 10.254.2.15 4302 83 
2010-10-11 14:57:51 10.254.3.1 4302 83 
2010-10-11 14:57:51 10.254.2.6 4302 83 

where the columns represent respectively: 

1) timestamp when the record was created; 

2) ip address of the base station; 3) tag identifier 
and; 4) ssi. The ssi represents the signal strength of 
the response from the tag. Each base station registers 
each tag at the same timestamp with different signal 
strengths, which can be interpreted as the lower the 
signal strength value is, the closer the tag is to the base 
station. This value can then be used in the data cleanup 
process to discard the base station records in which the 
tag is furthest from. 

In the data cleanup process we start by group- 
ing all the ips (with the lowest ssi) for a given 
timestamp and tag. After this step we can merge 
all records that share the tag and ip and have consec- 
utive timestamp into a single interval. 

7. Conclusion 

In this paper we have presented a generalised RDF 
annotation framework that conservatively extends the 
RDFS semantics, along with an extension of the 
SPARQL query language to query annotated data. The 
framework presented here is generic enough to cover 
other proposals for RDF annotations and their query 
languages. Our approach extends the classical case of 
RDFS reasoning with features of different annotation 
domains, such as temporality, fuzzyness, trust, etc. and 
presents a uniform and programatic way to combine any 
annotation domains. 



Furthermore, we presented a semantics for an ex- 
tension of the SPARQL query language, AnQL, that 
enables querying RDF with annotations. Queries ex- 
emplified in related literature for specific extensions of 
SPARQL can be expressed in AnQL. Noticeably, our 
semantics goes beyond the expressivity of the current 
SPARQL specification and includes some features from 
SPARQL 1.1 such as aggregates, variable assignments 
and sub-queries. We also described our implementation 
of AnQL based on constraint logic programming tech- 
niques along with a practical experiment for represent- 
ing sensor data as Annotated RDF. 



Acknowledgement 

We would like to thank Gergely Lukacsy for his 
participation in the development of this report. The 
work presented in this report has been funded in 
part by Science Foundation Ireland under Grant No. 
SFI/08/CE/I1380 (Lfon-2) and supported by COST Ac- 
tion IC0801 on Agreement Technologies. 



References 

[I] F. Manola, E. Miller, RDF Primer, W3C Recommendation, 
World Wide Web consortium, available at http : / /www . w3 . 
|org/TR/rdf-primer71 (Feb. 10 2004). 

[2] C. Gutierrez, C. A. Hurtado, A. A. Vaisman, Introducing Time 
into RDF, IEEE Transactions on Knowledge and Data Engineer- 
ing 19(2) (2007) 207-218. 

[3] A. Pugliese, O. Udrea, V. S. Subrahmanian, Scaling RDF with 
time, in: J. Huai, R. Chen, H.-W. Hon, Y. Liu, W.-Y. Ma, 
A. Tomkins, X. Zhang (Eds.), Proceedings of the 17th Interna- 
tional Conference on World Wide Web, WWW 2008, Beijing, 
China, April 21-25, 2008, ACM, 2008, pp. 605-614. 

[4] J. Tappolet, A. Bernstein, Applied Temporal RDF: Efficient 
Temporal Querying of RDF Data with SPARQL, in: Aroyo et al. 
1531 . pp. 308-322. 

[5] M. Mazzieri, A. F. Dragoni, A Fuzzy Semantics for the Re- 
source Description Framework, in: Uncertainty Reasoning for 
the Semantic Web I, ISWC International Workshops, URSW 
2005-2007, Revised Selected and Invited Papers, no. 5327 in 
Lecture Notes in Computer Science, Springer, 2008, pp. 244- 
261. 

[6] U. Straccia, A Minimal Deductive System for General Fuzzy 
RDF, in: A. Polleres, T. Swift (Eds.), Web Reasoning and Rule 
Systems, Third International Conference, RR 2009, Chantilly, 
VA, USA, October 25-26, 2009, Proceedings, Vol. 5837 of Lec- 
ture Notes in Computer Science, Springer, 2009, pp. 166-181. 

[7] O. Hartig, Querying Trust in RDF Data with tSPARQL, in: 
Aroyo et al. (44), pp. 5-20. 

[8] S. Schenk, On the Semantics of Trust and Caching in the Seman- 
tic Web, in: Proc. of 7th International Semantic Web Conference 
(ISWC'2008), 2008, pp. 533-549. 

[9] R. Q. Dividino, S. Sizov, S. Staab, B. Schueler, Querying for 
Provenance, Trust, Uncertainty and other Meta Knowledge in 
RDF, Journal of Web Semantics 7 (3) (2009) 204-219. 



27 



[10] D. Brickley, R. Guha, RDF Vocabulary Description Lan- 
guage 1.0: RDF Schema W3C Recommendation, World Wide 
Web consortium, available at |http : / /www . w3 ■ org/TR/| 

|rdf-schema71(Feb. 10 2004), 

URL http : //www. w3 . or g/TR/rdf- schema/ 

[11] A. Seaborne, E. Prud'hommeaux, SPARQL Query Lan- 
guage for RDF, W3C Recommendation, World Wide Web 
consortium, available at http : / /www . w3 ■ org/TR/| 
|rdf-sparql-query/1 (Jan. 15 2008). 

[12] U. Straccia, N. Lopes, G. Lukacsy, A. Polleres, A General 
Framework for Representing and Reasoning with Annotated Se- 
mantic Web Data, in: Proceedings of the Twenty-Fourth AAAI 
Conference on Artificial Intelligence (AAAI-10), AAAI Press, 
2010, pp. xxx-xxx. 

[13] N. Lopes, A. Polleres, U. Straccia, A. Zimmermann, AnQL: 
SPARQLing Up Annotated RDF, in: Proceedings of the In- 
ternational Semantic Web Conference (ISWC-10), no. 6496 in 
Lecture Notes in Computer Science, Springer- Verlag, 2010, pp. 
518-533. 

[14] O. Udrea, D. R. Recupero, V. S. Subrahmanian, Annotated RDF, 
in: The Semantic Web: Research and Applications, 3rd Euro- 
pean Semantic Web Conference, ESWC 2006, no. 401 1 in Lec- 
ture Notes in Computer Science, Springer, 2006, pp. 487-501. 

[15] O. Udrea, D. R. Recupero, V. S. Subrahmanian, Annotated RDF, 
ACM Transactions on Computational Logic 1 1 (2) (2010) 1^41. 

[16] M. Kifer, V. Subrahmanian, Theory of Generalized Annotated 
Logic Programming and its Applications, Journal of Logic Pro- 
gramming 12 (1992) 335-367. 

[17] T. J. Green, G. Karvounarakis, V. Tannen, Provenance Semir- 
ings, in: L. Libkin (Ed.), Proceedings of the Twenty-Sixth 
ACM SIGACT-SIGMOD-SIGART Symposium on Principles 
of Database Systems, June 11-13, 2007, Beijing, China, ACM 
Press, 2007, pp. 31^40. 

[18] G. Karvounarakis, Z. G. Ives, V. Tannen, Querying data prove- 
nance, in: A. K. Elmagarmid, D. Agrawal (Eds.), SIGMOD 
Conference, ACM, 2010, pp. 951-962. 

[19] M. Mazzieri, A. F. Dragoni, A Fuzzy Semantics for Semantic 
Web Languages, in: P. C. G. da Costa, K. B. Laskey, K. J. 
Laskey, M. Pool (Eds.), ISWC-URSW, 2005, pp. 12-22. 

[20] M. Mazzieri, A Fuzzy RDF Semantics to Represent Trust Meta- 
data, in: 1st Workshop on Semantic Web Applications and Per- 
spectives (SWAP2004), Ancona, Italy, 2004, pp. 83-89. 

[21] J. J. Carroll, C. Bizer, P. J. Hayes, P. Stickler, Named graphs, 
Journal of Web Semantics 3 (4) (2005) 247-267. 

[22] P. Buneman, E. Kostylev, Annotation Algebras for RDFS, in: 
The Second International Workshop on the role of Semantic 
Web in Provenance Management (SWPM-10), CEUR Work- 
shop Proceedings, 2010. 

[23] A. Hogan, Exploiting RDFS and OWL for Integrating Hetero- 
geneous, Large-Scale, Linked Data Corpora, Ph.D. thesis. Dig- 
ital Enterprise Research Institute, National University of Ire- 
land, Galway, available from http://aidanhogan.com/ 
|docs/thesis/| defended. (2011). 

[24] S. Munoz, J. Perez, C. Gutierrez, Minimal Deductive Systems 
for RDF, in: E. Franconi, M. Kifer, W. May (Eds.), The Se- 
mantic Web: Research and Applications, 4th European Seman- 
tic Web Conference, ESWC 2007, Innsbruck, Austria, June 3-7, 
2007, Proceedings, Vol. 4519 of Lecture Notes in Computer Sci- 
ence, Springer, 2007, pp. 53-67. 

[25] P. Hayes, RDF Semantics, W3C Recommendation, World Wide 
Web consortium, available at http://www.w3.org/TR/ 
|rdf-mt/| (Feb. 10 2004). 

[26] C. Gutierrez, C. Hurtado, A. O. Mendelzon, Foundations of 
Semantic Web Databases, in: A. Deutsch (Ed.), Proceedings 
of the Twenty-third ACM SIGACT-SIGMOD-SIGART Sym- 



posium on Principles of Database Systems, June 14-16, 2004, 
Paris, France, ACM, 2004, pp. 95-106. 

[27] G. Ianni, T. Krennwallner, A. Martello, A. Polleres, Dynamic 
Querying of Mass-Storage RDF Data with Rule-Based Entail- 
ment Regimes, in: Bernstein et al. [43], pp. 310-327. 

[28] P. Hajek, Metamathematics of Fuzzy Logic, Trends in Logic, 
KluwerAcademic Publisher, 1998. 

[29] E. P. Klement, R. Mesiar, E. Pap, Triangular Norms, Trends in 
Logic - Studia Logica Library, Kluwer Academic Publishers, 
2000. 

[30] S. Abramsky, A. Jung, Domain Theory, in: S. Abramsky, D. M. 
Gabbay, T. S. E. Maibaum (Eds.), Handbook of Logic in Com- 
puter Science - Volume 3: Semantic Structures, Oxford Univer- 
sity Press, 1994, pp. 1-168. 

[31] L. Ding, T. Finin, Y. Peng, P. P. da Silva, D. L. McGuinness, 
Tracking RDF Graph Provenance using RDF Molecules Tech. 
rep., Knowledge System Lab (2005). 

URL |f tp : / /ftp . ksl . Stanford ■ edu/pub/KsTT| 
|Reports/KSL-05-0 6.pdf] 

[321 J. Carroll. C. Bizer. P. J. Hayes, P. Stickler, Named graphs, 
provenance and trust, in: A. Ellis, T. Hagino (Eds.), Proceed- 
ings of the 14th International Conference on World Wide Web, 
WWW 2005, Chiba, Japan, May 10-14, 2005, ACML Press, 
2005, pp. 613-622. 

[33] G. Flouris, I. Fundulaki, P. Pediaditis, Y. Theoharis, 
V. Christophides, Coloring RDF Triples to Capture Provenance, 
in: Bernstein et al. (43), pp. 196-212. 

[34] O. Hartig, Provenance Information in the Web of Data, in: 
C. Bizer, T. Heath, T. Berners-Lee, K. Idehen (Eds.), Linked 
Data on the Web (LDOW 2009), Proceedings of the WWW2009 
Workshop on Linked Data on the Web, Madrid, Spain, April 20, 
2009, Vol. 538 of CEUR Workshop Proceedings, CEUR, 2009. 

[35] R. Delbru, A. Polleres, G. Tummarello, S. Decker, Context 
Dependent Reasoning for Semantic Documents in Sindice, in: 
A. Fokoue, Y. Guo, J. H. T. Liebig (Eds.), 4th International 
Workshop on Scalable Semantic Web Knowledge Base Systems 
(SSWS2008), 2008. 

[36] N. M. Labrador, U. Straccia, Monotonic mappings invari- 
ant linearisation of finite posets, Tech. rep., Computing Re- 
search Repository, available as CoRR technical report at 
http://arxiv.org/abs/1006.2679 (2010). 

[37] J. Perez, M. Arenas, C. Gutierrez, Semantics and complexity of 
SPARQL, ACM Transactions on Database Systems 34 (3). 

[38] R. Angles, C. Gutierrez, The Expressive Power of SPARQL, in: 
A. P. Sheth, S. Staab, M. Dean, M. Paolucci, D. Maynard, T. W. 
Finin, K. Thirunarayan (Eds.), International Semantic Web Con- 
ference, Vol. 5318, Springer, 2008, pp. 1 14-129. 

[39] S. Harris, A. Seaborne, SPARQL 1.1 Query Language, W3C 
Working Draft, W3C, |ht"tp: //www.w3 . org/TR/201071 
|WD-sparqlll-query-20100 601/| (2010). 

[40] J. F. Allen, Maintaining knowledge about temporal intervals, 
Communications of the ACM 26 (1 1) (1983) 832-843. 

[41] D. Peterson, S. S. Gao, A. Malhotra, C. M. Sperberg-McQueen, 
H. S. Thompson, W3C XML Schema Definition Language 
(XSD) 1.1 Part 2: Datatypes, W3C Working Draft, World Wide 
Web consortium, available at http ~/www . w3 . org/TR/| 
|2009/WD-xmlschemall-2-20091203/| (Dec. 3 2009). 

[42] J. Wielemaker, Z. Huang, L. van der Meij, SWI-Prolog and the 
Web, Theory and Practice of Logic Programming 8 (3) (2008) 
363-392. 

[43] A. Bernstein, D. R. Karger, T. Heath, L. Feigenbaum, D. May- 
nard, E. Motta, K. Thirunarayan (Eds.), The Semantic Web - 
ISWC 2009, 8th International Semantic Web Conference, ISWC 
2009, Chantilly, VA, USA, October 25-29, 2009. Proceedings, 
Vol. 5823 of Lecture Notes in Computer Science, Springer, 



28 



2009. 

[44] L. Aroyo, P. Traverso, F. Ciravegna, P. Cimiano, T. Heath, 
E. Hyvonen, R. Mizoguchi, E. Oren, M. Sabou, E. P. B. Sim- 
perl (Eds.), The Semantic Web: Research and Applications, 6th 
European Semantic Web Conference, ESWC 2009, Heraklion, 
Crete, Greece, May 31-June 4, 2009, Proceedings, Vol. 5554 of 
Lecture Notes in Computer Science, Springer, 2009. 



Appendix A. Proofs of theorems and propositions 

Appendix A. 1. Proof of Theorem \5. 2 \ 

We start by proving that for all z, z' e Li,A(z ffii z') >2 
A(z) <8> 2 A(z'). 

Proof Appendix A.l. Let z,z' e L\. In order to prove the 
proposition, we introduce the notation K A =d e f [J Q A \ z <i 
{ ®\jX). The property that we want to prove can be rewritten: 

lub{ < ®l jy \JeKf] ® 2 luW^V I J' 6 K A ] 

Let us introduce two intermediary lemmas: 

Lemma Appendix A.l. For all z, z' e L[ , = K A n K A . 

Proof Appendix A.l. We simply prove each inclusion sepa- 
rately: 

C: let J e K A S € , which implies that z ®\ zf <i { ®' e ,x So, 
Z <i J^ljX and z' ®i ,X that is J 6 Kf and J 6 K A . 

2: let J e K A n K A , which implies that z <i ( ®^.t and 

z' <i SO z©i z' <i ( ®lj x by definition of®\. 

Consequently, J 6 K A ,. 



Lemma Appendix A.2. For all z,z' e L\,K A e , = [J U J' | 
(JJ')e K A xK A ,). 

Proof Appendix A.3. Again, we prove each inclusion sepa- 
rately: 

C: trivial since J e K A , implies that J e K A n K A , and 

z@\z ' z 

J = J U J. 

2: let J e K A and f 6 K A . Clearly, , ®> x <\ , ®' .x 
Consequently, J (J J' 6 AT; 4 . Symmetrically, we prove that 
JUj'e K A . 

This allows us to rewrite the problem into: 

ta Wj?? u ,y I (■*/') e**x**}fc a 

lub{ < | J e K A \® 2 lub{ (t ^,,y | J' e Ay] 

which is more concisely written: 



y\ > 2 ^] ^ 2 y\® 2 ^ 2 Yy2 y 



This is easily established by distributivity of®2 over ©2 ln • 
right hand side and by remarking that^\ 



J e K , \Je K£ (a, v) e J (x.y) e J 



(/,J'}eJTf xK^ \(aj)e/u; 



The second part of the proof demonstrates that for all z, z' e 

L u A(z0, zf) > 2 A(z) ffi 2 A(z') 

Proof Appendix A.4. Before giving the main arguments for 
the proof, we rewrite the goal asfollows:\^\ 

lubj JS^y I J e K A } ©, lubj ®l Jiy I f e K A ,} 

which again can be made more concise with the following no- 
tation: 



, yi© 2 ~ : I ^ 2 y\<2 ^ 2 I ~~y 



Associativity of®2 simplifies the equation further: 



y <2 



1 (^ 2 y). 



We established the result by first proving the following lemma: 
Lemma Appendix A3. For all z, z' 6 L { , K A U Kt c K A ^ 7 , . 

Proof Appendix A.5. Let J e K A . It holds that z <i 
and z ®i zf <i Z. So J e K A ^,. Idem for any J 6 K A . 

This allows us now to easily see that 



y 



( 



y). 



Notice that the opposite inequality does not hold in general. 

Appendix A.2. Proof of Theorem \5.4\ 

In order to prove the theorem, we first demonstrate the fol- 
lowing proposition: 

Proposition Appendix A.4. If D\ = (Li,ffii,®i,±i,Ti> is a 
lattice then, for all A c L, x L 2 finite, A = Normalise(A). 

Let us assume that D { is a lattice. We show the proposition by 
proving that at each step of the saturation and reduction, the set 
R is such that R = A. This is trivially true at the initialisation 
of Saturate. Now let us assume that R satisfies this property 
at a certain step of the execution. We start by ensuring that 



R = RU{(^' x, ^ 2 ^ 2 y)} 

;eX (v,v)ei JeX (x,y) e J 



18 Note that if ®2 is idempotent, then the inequality becomes an 
equality. 
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We here reuse the notation K A introduced above. 
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and 



R = RU{( Yyi VUl x, KUl yyi y>) . 



We can decompose further the proof by simply showing that, 
given (a, b), (c, d) e R, 



and 



R = R U {(a ®i c,b® 2 d)} 



R = flu ({flffii c,b0 2 d)} 



To structure the proof better, we split the proof into several 
lemmas corresponding to each of the aforementioned steps. 

Lemma Appendix A.5. Let z e L\. The equality 
flU {<a®, c,bffi 2 d)Kz) = R(z) holds. 

Proof Appendix A.6. If the pair (a ®i c, b ©2 d) already be- 
longs to R, the equality is trivial. Let us assume {a 0, c, b ffi 2 
d) i R so that we can easily distinguish between sets that in- 
clude (a ®i c, b ®2 d) and sets that do not. Using the definition 
ofR U {{a ®[ c, b ©2 d)}(z), we can write: 



R U {(a <g>! c, b ffi 2 d)}(z) = R(z) ©2 
lub{( ( ®|p0 2 (b © 2 d) I J c R and 



z<i ( ix Jy' SJ x)®i c)) . 



Further, due to distributivity, 



( >') 02 (b ©^ = (( 2 y) 02 b) ffi 2 (( ^ 2 y) ® 2 d) . 



We can therefore rewrite the previous equality to: 



flU {(a®! c,b®2 d}}(z) = R(z) 
ffi 2 lub{( ®2b\JQRandz <i ( ( ®i jX ) ffij (a 0! c)j 

ffi 2 lubj( ( ® ^v) 8 2 d I / c i? andz <, ( jf ' ,*) ©i (a ®i c)} 

Additionally, since D\ is a lattice, we have 



( 1 x)©i (a®i c) = ( 1 x)©! a)0i ( 1 x)©! c) . 



So z <i i.yljX) ©i (a 0i c) implies z <i C ©1 « 

Z <i ( ®\jX) ©i c. Jta means that J U (<a,o)j 6 (A: c R \ 

z <i { ®lj x ] so necessarily, (® 2 ) y)02O ^2 A(z). Analogically, 

we conclude that (.,®\jy) 02 ^ ^2 R(z) and generalising to 
any suitable J, we conclude, using the equation above, that 
i?U{<a0! c,b ffi 2 d)}(z) = R(z). 

Now let us prove this second equality: 

Lemma Appendix A.6. Let z e L\. The equality 
flU ((a®, c,o 02 d)}(z) = fl(z) holds. 



Proof Appendix A.7. We apply a similar method as for 
Lemma \Appendix A.5\ to get to the following equality: 



R U {(a ©! c, b 2 d)}{z) = fl(z) ©2 
lub{( ( ®| ,y) 02 (b ® 2 d)\JQR and 

77jw means that J U {(a, M, (c, d>} 6 [K c fl I z <, ®i r), 

which implies that {,®. 2 eJ y) 02 (o 02 a") <2 A(z). Generalising 
this to any suitable J, we obtain the equality. 

Now, let us prove that the reduce algorithm preserves the quasi 
homomorphism. 



Lemma Appendix A.7. A = Reduce(A). 

Proof Appendix A.8. Let (a, b) e R such that there exists 
{a! ,b'} e R such that a <j a' and b <2 V . Using the same 
approach as in Lemma \Appendix A.5\ we obtain the following 
equality: 



R(z) = R\{(a,b)}(z)®2 

lub{( < ®» /3 » 02 b I J C R \ [(a, b)} and 

Z<i ( ( ®^,*)ffi, a) . 

From the hypothesis, we have that z <i (.,®\jX) ©i a' for any 

appropriate J. Moreover, for the same J, we have ( ®| ; y) 02 
b <2 V . Generalising to all adequate J, we entail that: 



R \ \(a,b)}{z) > 2 lub{( (t ^ eJ y) ® 2 b\jQR\ {(a,b)} and 

Z<i (< ®^x)ffi, a] 



and, thus, R(z) = R \ ((a, b))(z). Similarly, every pair (_l_i,;y) 
or (x, ±2) does not affect the function R. 



Now, the proof of Proposition |Appendix A.4 follows from 
an inductiv e a pplication of L emmas Appendix A.5| |Ap-| 
pendix A.6 and |Appendix A.7 Therefore, A = Normalise(A) 



holds. 

Proof of the theorem. The implication <= is a direct conse- 
quence of Proposition |Appendix A.4| 

Let us prove the other direction. Let A and B two finite sets 
of pairs of primitive annotations in L\ x L2 such that A = B. 
For A c Lj x L 2 and x e L u let = {J c A | x <, ^^a}. 
We also remind that: 



J e K x (a.p) e J 

Moreover, we introduce the following new notation: 
A: L, — » L\ 

x ^®i©»„ 

We establish the proof through the support of several interme- 
diary lemmas. 
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Lemma Appendix A.8. If (x, y) e A then y < 2 A(x). 

Proof Appendix A.9. Let (x,y) e A. We remark that x <i 

<«J? ( W) a and Q A > so: 



y= 2 B <2 B = A(X) 

(o,»E«i,v>l JeK} M6J 



Lemma Appendix A.9. If (x,y) e A then x <t A(x). 

Proof Appendix A.10. Let {x,y) e A. For all J 6 K*, x <\ 

, ®' ,a so, since Di is a lattice, x <, , ®> ,a, that is, 
x <i A(x). 



Proof Appendix A.M. (Of Theorem 5.4 1 Let (x,y) e 
Normalise(A). From Lemma \Appendix A.ll\ we know that 
x = A(x) and y = A(x). Moreover, from the h ypothesis of the 



theorem, A(x) = B(x). Hence, due to Lemma Appendix A. 12 



there exists u 6 L\ such that(u,y) 6 Normalise(B) and x <\ u. 
By using the same reasoning, we can infer that there exists 
v e Li such that (v,y) 6 Normalise(A) and u < t v. But due 
to Lemma \Appendix A.I l\ we have that y = v and therefore 
(x,y) e Normalise(S). 

The situation is symmetrical with respect to A and B, so 
finally Normalise(A) = Normalise(S). 



Lemma Appendix A.10. IfD { = (Li, ffli,®i, J-i,Ti) is alat- 
tice, then a quasihomomorphism is an antitone function, with 
respect to the orders induced by ©i and ©2- 

Proof Appendix A.ll. Assume that D[ is a lattice. Let f be a 
quasihomomorphism. Let x, x' 6 L\ be two annotation values 
such that x <i x'. Then f(x) = f(x®x r ) >2 f(x) ©2 f(x') and, 
thus, f(x) >2 f(x'). 

Using the previous lemmas, we can prove two additional lem- 
mas that will bring us to the final proof: 

LemmaAppendixA.il. If {x,y) e Normalise(A) then x = 
A(x) and y = A(x). 

Proof Appendix A.12. Let (x,y) e Normalise(A). Conse- 
quently, (x,y) e Saturate(A). Moreover, by definition of 
Saturate, the pair (A(x),A(x)) must exist in Saturate(A). 
Additionally , from Lemma \Appendix A.8\ and Lemma \Ap-\ 
pendix A.9 we have that x <j A(x) and y = A(x), which im- 
plies that (x,y) should be eliminated by the Reduce algorithm 
during normalisation, unless x = A{x) and y = A(x). 

Lemma Appendix A.12. For all x e L\, there exists u 6 L\ 
such that (u,A(x)) 6 Normalise(A) and x <i u. 



Proof Appendix A.13. Let x e L[. Again, {A{x),A(x)) e 
Saturate(A). Then, due to the reduction algorithm, there 
must exist (u,v) 6 Normalise(A) such that u >\ A(x) and 
v >2 A(x).We can consider the following assertions: 



D\ is a lattice 
A(x) <i u 
Mx) <2 V 
A is antitone 
x <\ A{x) 
Mx) >2 A(Mx)) 
A(A(x)) >2 A{u) 
A(u) >2 V 
A(x) = v 
x<\ u 



(by hypothesis) 
(due to Reduce) 
(due to Reduce) 
(from (HI ) and Lemma Appendix A.10 



(from Lemma Appendix A. 9 



(from (Al) and (A2» 
(from (Rl)and(Al)) 
(from Lemma Appendix A. 8 



(from (A3), (A4), (A5) and (R2» 
(from (A2) and (Rl)) 



(HI) 
(Rl) 
(R2) 
(Al) 
(A2) 
(A3) 
(A4) 
(A5) 
(CI) 
(C2) 



Assertions (CI) and (C2) establish the lemma. 



31 



